( 


EMPIRICAL  BAYES  ESTIMATION  OF  THE  RESPONSE  FUNCTION  AND 
MULTIVARIATE  REGRESSION  MODEL 


By 

LI-CHU  LEE 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN 
PARTIAL  FULFILLMENT  OF  THE  REQUIREMENTS 
FOR  THE  DEGREE  OF  DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 


1989 


ACKNOWLEDGEMENTS 


The  author  would  like  to  express  her  sincere  gratitude  to  her  advisor,  Dr.  Malay 
Ghosh,  for  his  ideas,  advice  and  support  throughout  this  study.  The  author  is  also 
grateful  to  Dr.  Michael  A.  DeLorenzo,  Dr.  Ramon  C.  Littell,  Dr.  Kenneth  M. 
Fortier,  and  Dr.  Pejaver  V.  Rao  for  their  participation  as  members  of  her  graduate 
committee. 

Some  valuable  data  and  suggestions  were  provided  by  Dr.  Ramon  C.  Littell. 
His  help  is  gratefully  acknowledged.  The  author  wishes  also  to  thank  Dr.  Kenneth 
M.  Fortier  and  Mr.  Doug  Reese  for  their  programing  suggestions. 

Finally,  the  author  would  like  to  thank  her  husband  Li-Hwa  Lin  and  her  parents 
for  their  support  and  encouragement. 


11 


TABLE  OF  CONTENTS 


ACKNOWLEDGEMENTS ii 

LIST  OF  TABLES  v 

ABSTRACT vi 

CHAPTERS 

1 INTRODUCTION 1 

1.1  Background j 

1.2  Outline  of  Work  g 

2 EMPIRICAL  AND  HIERARCHICAL  BAYES  ESTIMATION  OF  THE 

RESPONSE  FUNCTION 13 

2.1  Introduction 13 

2.2  The  Empirical  Bayes  Approach 15 

2.3  The  Hierarchical  Bayes  Approach 33 

2.4  Numerical  Examples 39 

3 EMPIRICAL  BAYES  ESTIMATION  OF  MULTIVARIATE  REGRESSION 

MODEL  I 52 

3.1  Introduction 52 

3.2  The  Empirical  Bayes  Approcah 54 

3.3  Empirical  Bayes  Subset  Estimators 72 

3.4  Numerical  Examples §1 

4 EMPIRICAL  BAYES  ESTIMATION  OF  MULTIVARIATE  REGRESSION 

MODEL  II 87 

4.1  Introduction §7 

4.2  The  Empirical  Bayes  Approcich 89 


111 


4.3  Empirical  Bayes  Subset  Estimators 106 

4.4  Numerical  Examples H3 

5 CONCLUSIONS 

5.1  Summary  and  Conclusions 119 

5.2  Future  Studies  j21 

BIBLIOGRAPHY 122 

BIOGRAPHICAL  SKETCH  124 


IV 


LIST  OF  TABLES 


2.1 

2.2 

2.3 

2.4 

2.5 

2.6 

2.7 

3.1 

3.2 

4.1 

4.2 


The  estimated  frequentist  risk  results  for  3^  factorial  experiment.  41 

The  estimated  frequentist  risk  results  for  2^  factorial  experiment 
replicated  two  times 42 

The  estimated  Bayes  risk  results  for  3^  factorial  experiment.  . . 44 

The  estimated  Bayes  risk  results  for  2^  factorial  experiment  repli- 
cated two  times 

Six-week  heights  of  chrysanthemum  plants  in  pots  treated  with 
two  growth  retardant  chemicals  at  six  rates  of  application 46 

Sum  of  squared  errors  and  the  Euclidean  distances  for  different 
estimators 4g 

Predicted  values  for  different  estimators 50 

The  estimated  frequentist  risk  results  for  2^  factorial  experiment.  83 

The  estimated  Bayes  risk  results  for  2^  factorial  experiment.  . . 85 

The  estimated  frequentist  risk  results  for  2®  factorial  experiment.  115 

The  estimated  Bayes  risk  results  for  2^  factorial  experiment.  . . 117 


V 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

EMPIRICAL  BAYES  ESTIMATION  OF  THE  RESPONSE  FUNCTION  AND 
MULTIVARIATE  REGRESSION  MODEL 

By 

LI-CHU  LEE 
May  1989 

Chairman:  Dr.  Malay  Ghosh 
Major  Department:  Statistics 

In  the  analysis  of  data  from  designed  experiments,  often  the  problem  is  to  es- 
timate the  response  function.  In  doing  so,  there  is  often  a choice  between  two 
nested  regression  models,  for  example,  one  involving  the  main  effects  and  the  inter- 
actions, and  the  other  involving  the  main  effects  only.  Another  example  involves 
one  model  having  intercept,  linear  and  quadratic  terms,  while  the  other  model  has 
only  intercept  and  linear  terms.  We  consider  empirical  Bayes  and  hierarchical  Bayes 
estimators  of  the  response  function  in  univariate  regression  models  and  empirical 
Bayes  estimators  of  the  response  function  in  multivariate  regression  models. 

The  proposed  estimators  are  compromises  between  the  least  squares  estimators 
for  the  full  and  reduced  models,  as  well  as  between  the  least  squares  estimator  for 
the  full  model  and  a weighted  average  of  the  least  squares  estimators  for  the  reduced 
model.  The  shrinkage  coefficients  of  these  estimators  are  data  dependent.  Several 
theorems  are  given  which  describe  the  frequentist  as  well  as  the  Bayesian  proper- 
ties of  these  proposed  estimators.  Some  modified  empirical  Bayes  estimators  are 
developed  and  compared  with  the  prelimary  test  estimators.  Numerical  examples 
are  given  which  illustrate  the  estimation  methods. 


VI 


CHAPTER  1 
INTRODUCTION 


1.1  Background 

In  his  seminal  1956  Berkeley  Symposisum  paper,  Stein  presented  the  surprising 
phenomenon  that  the  sample  mean  was  an  inadmissible  estimator  of  the  normal 
population  mean  under  squared  error  loss  in  dimensions  higher  than  two.  Later, 
James  and  Stein  (1961)  produced  an  explicit  estimator  dominating  the  sample  mean. 
Since  the  sample  mean  is  a minimax  estimator  of  the  normal  population  mean  un- 
der squared  error  loss,  any  estimator  dominating  the  sample  mean  is  also  minimax. 
Since  then  many  articles  have  been  devoted  to  finding  a class  of  minimax  estima- 
tors which  dominate  the  sample  mean.  Mention  should  be  made  of  the  articles 
of  Baranchick  (1970),  Strawderman  (1971),  Bock  (1975),  Efron  and  Morris  (1976), 
Berger  (1976)  and  Berger  (1982)  among  others. 

In  the  seventies,  Efron  and  Morris,  in  a series  of  articles,  gave  an  interesing 
^nipiricsl  Bayes  interpretation  of  the  James-Stein  estimator.  An  empirical  Bayes 
estimator  is  an  estimator  that  the  unknown  parameters  of  the  corresponing  Bayes 
estimator  are  estimated  from  the  data.  This  opened  up  the  possibility  of  applying 
the  James-Stein  estimator  to  many  data-analysis  problems,  whenever  there  was  a 
need  to  ‘borrow  strength  from  the  ensemble’.  This  important  aspect  of  the  James- 
Stein  estimators  was  illustrated  in  the  articles  of  Efron  and  Morris  (1975)  with  real 
data  examples. 

Stein  (i960)  also  pointed  out  that  the  least  squares  estimators  in  a general  linear 
regression  model  could  be  improved  by  certain  ‘shrinkage  estimators’  under  squared 
error  loss.  A class  of  minimax  estimators  dominating  the  least  squares  estimators 


1 


2 


under  squared  error  loss  was  proposed  in  the  articles  of  Sclove  (1968)  and  Baranchik 
(1973).  These  articles,  however,  did  not  point  out  any  empirical  Bayes  aspect  of 
these  minimax  estimators. 

Empirical  Bayes  motivation  of  James-Stein  type  shrinkage  estimators  dominat- 
ing the  least  squares  estimators  in  linear  regression  models  is  alluded  to  in  Arnold’s 
(1981)  book  (see  Chapter  11).  This  phenomenon  was  brought  out  more  explicitly 
in  a recent  article  of  Judge,  Hill  and  Bock  (1988). 

The  subject  matter  of  this  thesis  stems  more  directly  from  an  article  of  Ghosh, 
Saleh  and  Sen  (1987).  These  authors  proposed  an  empirical  Bayes  estimator  of 
in  the  general  regression  model 


H — ^1^1  + K.2§_2  + 


(1.1) 


with  e ~ Nn[Q,o^L),  where  it  was  suspected  that  = Q.  The  prior  distribution  of 
^ ~ '§.2 ) taken  as  Ap(^,  r^(X^X)~^)  with  - (i/f,0^),  where  was  a 

Pi  X 1 vector  with  pi  < p.  The  empirical  Bayes  estimator  of  /?^, 


Us(X.)  = I,  + (1  - 2)  )(g^ 

{n  P + 2)(^2^22.1^2) 


(1.2) 


where  S'j  — 1|  Y_  X0  ||^/(«  ~ p),  served  as  compromise  between  the  least 
squares  estimator (LS-E;)  of  for  the  full  model,  and  the  reduced  least  squares 
estimator  under  the  hypothesis  Ho  : = 0,  and  Cjj.i  = with 

^3  ~ and  X - (XjXj).  The  compromise  estimator  leaned  more 

towards  when  the  data  suggested  that  Hq  is  true,  and  towards  otherwise. 
The  optimality  of  e^BiY.)  was  proved  within  the  class  of  estimators  of  the  form 


^(i:)  = ^,  + (i- 


c{n-p)Sj 


Z.T 


(n-p+2)(^^C22.i^J 


-)(^i-^i), 


(1.3) 


where  c is  a constant,  and  c^bOO  = ^p2-20O-  To  be  specific,  under  the  loss 


T(^,,a)  = (a-^J^g(a-^J/^7^ 


(1.4) 


3 


where  Q is  a positive  definite  weight  matrix,  the  Bayes  risk  of  ^ under  the  afore- 
mentioned prior  (to  be  denoted  by  is  given  by 


+B[tr[QC-^^)  + (1 


2c(n  — p) 


+ 


■{n-p) 


(1.5) 


P2(n-p  + 2)  P2(n-p  + 2)(p2-2)' 
where  B = o^/(r^  +cr^),  and  = QjnQXiQQXiG-u-  The  Bayes  risk  is  the  expecta- 
tion value  of  the  loss  (1.4)  based  on  the  marginal  distribution  of  y.  The  risk  r(^,^) 
is  minimized  at  c = p2  - 2,  and  e^s  = ^p,-2-  Note  that  - Xj Xj  and  j = 


{ij  = 1,2). 

Comparisons  of  the  Bayes  risk  of  e^B  with  the  ones  for  and  shows  that 

t’-gSn*.)  > 0,  (1.6) 

SO  that  the  Bayes  risk  of  Ceb  is  smaller  than  that  of  and 

>■({,&)  - r(?,eEs)  = {(1  - - B(1  - > 0, 

[n-p-|-2jp2 

(1.7) 

SO  that  the  Bayes  risk  of  is  smaller  than  that  of  if  and  only  if  (1  — > 

1 — (n  — p)(p2  — 2)/(n  — p + 2)p2.  The  condition  stated  in  (1.7)  is  expected  since 

whenever  the  prior  variance  is  much  smaller  than  the  sampling  variance  then 

B is  close  to  1,  and  in  that  case  is  expected  to  perform  better  than  e^B  since  the 

prior  mean  of  /?j  is  set  equal  to  zero. 

~T  ~ _ 

Let  F = Q-22.\§_iI ^r^p+2^e) > which  is  a multiple  of  the  usual  F statistic  of 


testing  Hq  : — 0,  and 


(1.8) 


which  is  the  general  class  of  estimators  of  (3^.  Then  under  the  following  three 
conditions 


(i)  2 chi{^C_22,i)  < tr(iyC22^i),  where  chi (iyC22^i)  denotes  the  largest  eigen- 
value of  (WC22\); 


4 


(ii)  0 < <i>{F)  < 2{tr{WCj^l^)lch,{Wa^l^)  - 2}; 

(iii)  4>{F)  increasing  monotonic  ally  in  F, 

it  is  found  by  Ghosh,  Saleh,  and  Sen  (1987)  that  the  frequentist  risk  of  is  less 
than  that  of  the  minimax  estimator  of  under  the  loss  described  in  (1.4). 

Note  that  (p{F)  is  a general  function  of  F and  the  frequentist  risk  of  the  estimators 
of  is  the  expectation  value  of  the  loss  (1.4)  for  fixed  given  value  of  (3 
When  <f){F)  = p2  - 2, 

0 < P2  - 2 < 2{tr{WC-2\)/ch,{WC-2\)  - 2},  (1.9) 

then  if  condition  (i)  holds,  the  empirical  Bayes  estimator  will  dominate  and  will 
be  a minimax  estimator  of  0^. 

When  both  the  condition  (i)  and  the  condition  described  in  (1.9)  hold,  the 
modified  empirical  Bayes  estimators, 

Sebm  = - (1  - (1  - j)9{F))il^-0^),  (1.10) 

with  ff(F)  = representing  some  positive  constant  d,  and  I denoting  the  usual 

indicator  function,  were  shown  to  dominate  the  preliminary  test  estimators, 

^PTE  = h - {1  - g{F)}{l^- l^),  (1.11) 

which  were  originally  proposed  by  Bancroft  (1944). 

The  prior  distribution  used  by  Ghosh,  Saleh  and  Sen  (1987)  to  motive  their 
empirical  Bayes  estimators  (see  also  Judge,  Hill  and  Bock,  1988)  has  been  called  a 
g-prior  by  Zellner  (1986).  Zellner  argued  that  g-priors  could  be  viewed  as  reference 
informative  prior  distributions. 

The  conceptual  idea  behind  the  g-priors  as  described  by  Zellner  (1986)  is  given 
as  follows: 

Before  observing  Y,  consider  a ‘conceptual’  or  ‘imaginary’  sample  To,  an  n x 1 
vector,  assumed  to  be  generated  by 


Yxi  = X0  + u, 


(1.12) 


5 


where  u is  distributed  as  iV(0,  ctoF),  and  X is  an  n x p design  matrix  in  the  model 
of  interest.  The  likelihood  function  for  Eq.(l.l2)  is  given  by 

OC  (1.13) 

a exp{-[(n  - p)s^  + {§_-  - ~^)]l2al), 

where 

^ = {X^V-^X)-^X^V-^Y^, 

and 

{n  - p)sl  = (Zo  - X0^fV-^{Y^  - X§J. 

Little  knowledge  is  known  about  the  values  of  §_  and  Thus,  one  may  use  the 
diffuse  prior  (see,  for  example,  Jeffreys,  1967) 

p{0,ol)  <x  1/ao.  (1.14) 

This  implies  that  the  elements  of  §_  and  logao  are  independently  and  uniformly 
distributed.  By  combining  Eqs.(l.l3)  and  (1.14),  the  posterior  distribution  of  ^ 
for  given  ol  will  be  normal  with  the  mean  ^ equal  to  {]^Y_~^X)~^X^Y_~^Y^ 
and  the  variance-covariance  matrix  equal  to  (X^Z~'X)-V2.  Note  that  the  pos- 
terior variance-covariance  matrix  is  proportional  to  the  variance-covariance  matrix 
of  the  LSE  I = {X^V-^X)-^X^V-^Y  of  Thus,  while  carrying  out  the  ac- 
tual Bayesian  analysis,  the  prior  variance-covariance  matrix  is  taken  proportional 
to  (X^Z“^X)-L 

The  subject  matter  of  this  thesis  first  focuses  on  the  response  function  in 
univariate  regression  models  F = X^  + e (Khuri  and  Cornell,  1987).  It  then 
focuses  on  the  multivariate  general  linear  regression  models  which  are  often  useful 
in  growth  and  response  curve  studies.  Empirical  Bayes  estimators  for  the  multi- 
variate general  linear  model  where  the  individual  regression  coefficients  /?^’s  are 


6 


random  effect  were  proposed  by  Reinsel  (1985)  who  placed  particular  emphasis  on 
the  estimators’s  mean  squared  error  (MSE)  matrix.  His  linear  model  for  m distinct 
individuals  was  of  the  form 

Yik  = + ej,,  k = (1-15) 

The  vectors  ^ • • • ? were  assumed  to  be  independently  distributed  as 

N{0,a^I).  It  was  assumed  that  the  coefficients  with  dimension  p x 1,  were  com- 
posed of  both  fixed  effects  that  incorporate  concomitant  information  and  individual 
random  effects  such  that 


~ ^ = 1,  • • • , (1-16) 

where  A = (cj,  • • • , a„)  is  an  r x m ‘across  individual’  design  matrix  of  constants 
of  full  rank  r < m,  whose  A:-th  column  represents  the  values  of  the  concomitant 
‘background’  regressor  variables  associated  with  the  k-th.  individual,  and  5 is  a p x r 
matrix  of  unknown  parameters.  The  individual  random  effects  are  assumed  to 
be  distributed  as  iV(0,E;^),  independent  over  individuals  and  independent  of  the 
errors  e^.  Eq.  (1.16)  can  also  be  viewed  as  a prior  distribution  of  0^.  For  the 
case  that  X^  = X for  all  k and  all  parameters  B,  and  Ex  are  unknown,  the 
proposed  empirical  Bayes  estimator  of  0^  for  the  A-th  individual  dominated  the 
usual  least  squares  estimator  of  0^  with  a smaller  individual  (unconditional)  MSE 
matrix.  According  to  Eqs  (1.15)  and  (1.16),  the  Bayes  estimator  of  s ^^(r^)  is 
the  posterior  mean, 

fB(n)  = E{0^  lYj,)  =0^-  - Ba,),  (1.17) 

where  = {XlX^)~^XlYj^,  the  least  squares  estimator  of  and  Z*  Z*)~‘ 

Hence,  es(y*)  is  the  minimum  MSE  estimator  of  with  the  unconditional  MSE 
matrix  given  by 

^Mic  - = Vj,~ 


(1.18) 


7 


Under  the  models  (1.15)  and  (1.16)  with  all  X,  equal  to  X,  the  empirical  Bayes 
estimator  has  the  following  form: 


^eb(Y,)  = ^ - £(£  + - Ba,), 


(1.19) 


where  = (X^X)-'X^Y„  £ = {^^)s;(x^x}-'  with  = EZ,r^(I  - 
£(X^X)-'X^)V^,  B = l'A(A^A)-\  £ = 0^,  ■ ■ ■ , and 
(1-A(A^A)-'A^)^‘  - £.  Now  consider  estimators  of  individual  /3  of  the  form: 


L(Y,)  = ^ - cV{V  + t,y‘(A  - Ba,), 


(1.20) 


where  c is  an  arbitrary  constant.  The  unconditional  MSE  matrix  of  the  estimator 
£(Ki)  in  Eq.  (1.20)  for  any  individual  k is  given  by 


EmiW  - 2s)a(n)  - = i:  + (1  - a/(A^A)-^^ 

f c^im  — r)(m(n  — p)  + 2) 

x{t^ — 7 - 2c}V(V  + Ea)-'U. 

[m  — r — p — l)m{n  — p)  v—  _a;  _ 


(1.21) 


The  minimum  value  of  (1.21)  with  respect  to  c occurs  for  c = (m  — r — p — l)m(n 
p)/(m  - r)(m(n  - 9)  + 2),  with  the  minimum  MSE  matrix  value  of 


K - (1  - ^,(A^A)-w){^~  7 -^hv(v + E,)- V (1 

(m  - r)(m(n  - p)  + 2) 


.22) 


Note  from  (1.22)  that  the  MSE  matrix  of  the  estimator  (1.20)  with  the  optimal 
value  of  c is  smaller  than  that  of  the  individual  least  squares  estimator  for  every 
individual  k,  provided  the  m > r + p + 1. 


Reinsel  (1985)  also  investigates  the  MSE  matrix  properties  of  the  estimators 
(Zi)  m the  frequentist  sense  (i.e.  fixed  given  values  of  0 .■■■,(3  ) . The  estimators 
^0^)  appropriate  in  this  setting  as  long  as  the  concomitant  variables 

a*  are  capable  of  explaining  a significant  portion  of  the  variation  among  the  values 
Now,  consider  only  the  case  of  equal  individual  design  matrices  X*  = X.  Let 
£>  denote  expectation  conditional  on  the  values  of  3 = (3  •••/?)  Then 


8 


1 ^ 

in  -* 


^ F I +2) 

~ m[m[n  — p)) 


2c(m  — r) 
m 


(m-r-p-1)} 


X Y-E^{S_^^)YL,  (1.23) 

where  S_x  = §_^{L~  A{^  . As  with  the  unconditional  expression  ,the 

preceding  MSE  matrix  expression  is  minimized  by  the  choice  c = (m-r-p- 
l)m(n  -p)/(m  - r){m{n-p)  + 2),  with  the  minimum  value  for  Eq.  (1.23)  equal  to 


V-[(m-r-p-  l)2m(n  - p)/m{m{n  - p)  + 2)]ViE p{Sl^)V_.  (1.24) 

Thus  the  estimator  ^{Y_k)  dominates  the  individual  ^ in  terms  of  the  average 
MSE  matrix  conditionally  for  any  fixed  values  of  provided  m > r + 

p -f  1 and  0 < c < 2(m  - r - p - l)m{n  - p)/{(m  - r)(m(n  - p)  + 2)}.  Reinsel 
(1985)  also  investigated  this  procedure  in  certain  mixed  models  and  in  more  general 
multiparameter  situations  such  as  the  multivariate  one-way  random  effects  model. 

1.2  Outline  of  Work 


In  response  surface  analysis,  estimation  of  response  functions  is  often  of  interest. 
For  doing  the  estimation,  there  may  be  a choice  between  the  following  two  nested 
regression  models 


Y.  — + Y.2§j2  + §. 

(1.25) 

Y = + e. 

(1.26) 

where  e ~ iV(0,i:).  For  example,  in  factorial  experiments,  one  may  want  to  select 
one  of  the  two  models,  the  first  involving  the  main  effects  as  well  as  the  interaction 
effects,  and  the  second  involving  only  the  main  effects.  For  a given  set  of  data, 
using  the  least  squares  method,  the  full  model  always  produces  a better  fit  than 
the  reduced  model.  However,  this  may  not  be  the  case  for  estimating  X/3  {X  = 
{Xii  X2) i §_  (^j  )).  More  specially,  assuming  X'^V  ^X  to  be  nonsingular,  if 


9 


§_  is  the  LSE  oi  §_  based  on  the  full  model,  and  the  is  the  LSE  of  for  the 
reduced  model,  then  ||F  - X~§_  f < ||F  - |p.  This  does  not  necessarily  imply 

that  ||X^  - X§_  ||2<  \\Xi  0^  - 2C0  ||^  or  that  E\\X0  - X0  \\^<  E\\Xi  0^-  Xj3  \\^ 
for  all  0^  and  0^  under  the  usual  normal  model. 

One  classical  way  to  resolve  this  problem  is  to  test  the  null  hypothesis  Ho  : 
02  — Q»  S'lid  then  use  the  full  model  if  the  null  hypothesis  is  rejected  at  a desired 
level  of  significance,  and  use  the  reduced  model  if  the  opposite  is  true.  The  above 
procedure  suffers  from  the  drawback  that  there  is  no  general  way  of  incorporating 
the  degree  of  evidence  for  or  against  the  null  hypothesis  in  order  to  arrive  at  the 
estimator.  For  instance,  if  observed  F values  are  very  close  to  a critical  value  (at 
a chosen  level  of  signicance),  then  there  may  be  no  persuasive  reason  for  choosing 
one  model  in  preference  to  the  other. 

In  Chapter  2,  empirical  {EB)  and  hierarchical  Bayes  {HB)  estimators  of  X0 
are  developed.  Instead  of  estimating  the  unknown  parameters  of  the  Bayes  estima- 
tor from  the  data,  a hierarchical  Bayes  method  is  that  one  put  a prior  distribution, 
often  improper,  on  the  unknown  hyperparameters  and  arrive  at  the  posterior  dis- 
tribution of  0 given  Y = y.  Zellner’s  (1986)  g-prior  is  utilized  to  obtain  a Bayes 
estimator  of  X0.  The  structure  of  the  prior  mean  is  motivated  from  the  null  hy- 
pothesis Ho  : 0^  = 0.  Empirical  Bayes  {EB),  positive  part  EB,  and  modified 
EB  estimators  are  introduced  in  Section  2.2.  Under  a general  quadratic  loss,  the 
performance  of  the  empirical  Bayes  estimators  of  is  evaluated  using  both  the 
frequentist  and  the  Bayes  criteria.  Under  certain  conditions,  it  is  shown  that  the 
empirical  Bayes  estimators  dominate  the  minimax  estimator,  the  LSE  oi  X_0  in 
a frequentist  sense.  Thus  a group  of  minimax  estimators  of  X/?  is  provided.  The 
empirical  Bayes  estimators  are  also  shown  to  have  smaller  Bayes  risks  than  the 
LSE  for  the  full  model.  It  should  be  noted  though  that  like  the  empirical  Bayes 
estimators,  the  preliminary  test  estimators  {PTE)  serve  as  compromises  between 


10 


the  LSE^s  based  on  the  full  and  the  reduced  models.  The  PTE  is  dominated  by  the 
corresponding  modified  empirical  Bayes  estimator  since  the  former  does  not  take 
account  of  the  degree  of  evidence  for  or  against  the  null  hypothesis  Hq  : /3^  = 0. 
Several  theorems  which  describe  the  frequentist  as  well  as  the  Bayesian  properties 
of  the  proposed  estimators  are  given  in  Section  2.2.  In  Section  2.3  some  important 
HB  estimators  are  introduced  and  the  connection  between  the  HB  and  EB  esti- 
mators are  discussed.  An  advantage  of  the  HB  procedure  over  the  EB  procedure 
IS  that  the  former  can  be  used  to  construct  credible  sets  for  X0.  Several  numeri- 
cal examples  including  real  data  and  simulated  data  are  provided  in  Section  2.4  to 
illustrate  the  applicability  of  the  methods  proposed  in  Sections  2.2  and  2.3. 

This  research  further  studies  the  multivariate  linear  regression  model  which 
is  often  useful  in  growth  and  response  curve  analyses,  (see  e.g.  Reinsel,  1985). 
Suppose  there  is  a choice  between  the  following  multivariate  regression  models. 

Z*  = (1.27) 

r*  - ^ (1.28) 

where  e*  ~ N{0,Vj^)  for  k = 1,  ■■■  ,m.  the  model  given  in  (1.28)  is  based  on  the 
guess  that  = 0 for  all  Jk  = 1,  • • • , m. 

Instead,  one  can  also  compare  the  models  (1.27)  and 


yik-Kjci0i+ek,k  = l,---,m.  (1.29) 

The  model  (1.29)  is  not  only  based  on  the  guess  that  = Q V A:  = 1,  • • • , m but 
also  on  the  guess  that  have  a common  value  /3^  . 

Furthermore,  for  the  problem  of  best  fitting  of  a given  set  of  data  by  using  the 
least  squares  method,  the  model  in  (1.27)  behaves  better  than  the  reduced  models 
given  in  (1.28)  and  (1.29).  However,  when  the  estimators  of  (Zi/5^,  • • -,X^p  ) are 
compared  on  the  basis  of  general  quadratic  loss. 


(1.30) 


11 


where  is  a known  positive  definite  matrix,  the  LSE's  of  ba^ed  on  the 

model  in  (1.27)  may  not  keep  their  superiority. 

Chapter  3 introduces  the  empirical  Bayes  estimators  solving  the  choice  between 
the  models  in  (1.27)  and  (1.29).  In  order  to  motivate  the  Bayes  estimators  of 
(—1^1’  ■ ■ ■ Zellner’s  (1986)  g-prior  is  used  once  again. 

For  the  choice  between  the  models  given  in  (1.27)  and  (1.28),  the  analysis  follows 
closely  the  ones  described  in  Chapter  2.  However,  if  more  information  is  available 
about  the  parameters  in  the  model  (1.28),  Reinsel’s  (1985)  prior  distribution  of 
may  be  adopted  and  modified  in  finding  the  Bayes  estimators.  This 
is  what  is  used  in  Chapter  4 to  motivate  the  empirical  Bayes  estimators  assuming 

— • • • — — X-  These  empirical  Bayes  estimators  serve  as  a compromise 

between  the  LSE's  in  the  full  and  the  reduced  models  given  in  (1.27)  and  (1.28) 
where  Xi  = • ■ • = Xm  ^ K- 

The  outlines  of  Chapter  3 and  Chapter  4 are  similar.  In  Section  3.2  and  Sec- 
tion 4.2,  empirical  Bayes  estimators  and  positive  part  empirical  Bayes  estimators 
are  introduced,  respectively.  These  estimators  are  then  evaluated  on  the  basis  of 
the  frequentist  as  well  as  the  Bayesian  risks.  It  is  shown  that  there  exists  empirical 
Bayes  estimators  of  iX^i,,  ■ ■ ■ ,Xr.§_J  or  {X§_,,  • • • , X^^)  which  always  dominates 
the  LSE  of  the  model  in  (1.27)  under  squared  error  loss.  The  corresponding  robust- 
ness and  mmimax  estimation  properties  are  also  introduced.  The  preliminary  test 
estimators  serve  as  compromises  between  the  LSE's  based  on  the  model  in  (1.27) 
and  the  model  m (1.29)  (Chapter  3),  or  between  the  LSE  of  the  model  in  (1.27)  and 
the  LSE  in  the  model  (1.28)  (Chapter  4).  Also,  it  is  shown  that,  there  are  modified 
empirical  Bayes  estimators  which  dominate  the  preliminary  test  estimators. 

In  both  Section  3.3  and  4.3  several  estimators  of  0^^  are  proposed  which  shrink 
the  unrestricted  LSE,  to  the  restricted  LSE,  or  , (Chapter  3)  or  to 

a weighted  average  of  the  LSE's  (Chapter  4).  Under  squared  error  loss. 


12 


frequentist  risks  as  well  as  Bayes  risks  of  these  estimators  are  compared.  Numerical 
simulations  illustrating  the  applicability  of  the  estimators  introduced  in  Sections 
3.2  and  4.2  are  then  given  in  Sections  3.4  and  4.4  respectively. 

Finally,  a summary  of  the  present  work  and  some  suggestions  for  future  study 
are  given  in  Chapter  5, 


, CHAPTER  2 

EMPIRICAL  AND  HIERARCHICAL  BAYES  ESTIMATION  OF  THE 

RESPONSE  FUNCTION 

2.1  Introduction 

In  the  analysis  of  data  from  designed  experiments,  especially  in  response  surface 
analysis,  often  the  problem  is  estimation  of  the  response  function  (see,  e.g.,  Chaper 

6 of  Khun  and  Cornell,  1987).  For  doing  so,  there  may  be  a choice  between  the  two 
nested  regression  models 

y = X§_+e 

= KiP^  + X2P^  + e (2.1) 

and 

y = Xii^  + e,  (2.2) 

where  F is  an  n x 1 vector  of  observations,  X = {X^X^)  is  an  n x p design  matrix 
which  is  of  rank  p,  the  dimension  of  X,  and  X,  are  n x and  n x p^,  respectively, 
with  P — Pi  + P2,  ^ — (^^  ,1^2)  is  a p X 1 vector  of  regression  parameters  with 
unknown  values,  and  e,  is  an  n x 1 vector  of  error  terms.  It  is  assumed  that  e is 
distributed  as  N{0,a^V),  where  U is  a positive  definite  (p.d.)  matrix.  For  example, 
in  factorial  experiments,  one  may  want  to  select  one  of  the  two  models,  the  first 
involving  the  main  effects  as  well  the  interactions,  and  the  second  involving  only  the 
main  effects.  One  classical  way  to  settle  this  problem  is  to  test  the  null  hypothesis 
0,  and  then  use  model  (2.1)  if  the  null  hypothesis  is  rejected  at  a desired 
level  of  significance,  and  use  model  (2.2),  otherwise.  The  above  procedure  suffers 
from  the  drawback  that  there  is  no  general  way  of  incorporating  the  degree  of 


13 


14 


evidence  for  or  against  the  null  hypothesis  in  order  to  arrive  at  the  estimator.  For 
example,  if  the  observed  F value  is  very  close  to  the  critical  value  (at  a chosen  level 
of  significance) , then  there  seems  to  be  no  convincing  reason  for  choosing  one  model 
in  preference  to  the  other. 

For  the  problem  of  best  fitting  of  a given  set  of  data  by  the  least  squares  methods, 
the  full  model  always  beats  the  reduced  model.  However,  the  same  need  not  nec- 
essarily hold  for  estimating  Xfi^.  More  specifically,  let  p = [2^  V_~^  V~'^  Y 

denote  the  least  squares  estimator  {LSE)  of  §_  based  on  model  (2.1),  and  0 

= {KiY_  ^Xi)-^  denote  the  L5F;  of  for  model  (2.2),  then  (F-X/5)^ 

— (H  - but  that  does  not  necessarily 

imply  that  (X^-X^)^  V~^  {X§_- xl)  < {X0_- xjf  YT^  {X§_- Xj^)  or 
^yenE{X0-XlfV-^[X^-Xl)  < ^(X^  - X,  (X^  - X for  all 

In  this  chapter,  empirical  {EB)  and  hierarchical  Bayes  {HB)  estimators  of  X/9 
are  proposed.  The  EB  and  the  HB  estimators  serve  as  weighted  average  of  the 
LSE's  of  X0  based  on  the  full  model  (2.1)  and  the  reduced  model  (2.2).  These 
weights  are  data  dependent,  and  are  such  that  the  proposed  estimators  lean  more 
towards  Xl  if  the  F ratio  for  testing  Ho  : = 0 \s  large,  and  to  X^p^  when 

the  opposite  is  true.  In  fact,  it  is  shown  in  Section  2.2  that  if  there  is  sufficient 
strong  evidence  for  or  against  the  null  hypothesis,  then  some  of  the  proposed  EB 
estimators  may  indeed  equal  X^  or  Xj  as  the  case  may  be. 

Without  shrinking  the  unrestricted  least  squares  estimator  X^^^  to  the  re- 
stricted least  squares  estimator  Xj^  but  shrinking  Xj^  towards  zero,  another 
estimator  was  proposed  by  Sclove  (1968).  Sclove’s  estimator  serves  as  a cornprc^ 
mise  between  X^  0^  and  X0,  and  is  closer  to  X^  0^  if  Ho  : 0^  = 0 is  true,  and  close 
X0  otherwise.  It  will  be  shown  in  Section  2.2  that  the  EB  estimator  enjoys  both 
Bayesian  and  frequentist  risk  superiority  over  the  estimator  developed  by  Sclove. 


15 


The  result  is  expected,  since  is  a more  appropriate  estimator  of  X0  than 

when  Hq:P^=0  holds. 

The  preliminary  test  estimators  [PTE's]  provide  yet  another  compromise  be- 
tween the  LSE's  based  on  full  and  reduced  model.  In  Section  2.2,  it  will  be  seen 
that  for  every  PTE  of  X§_,  there  is  a corresponding  modified  EB  estimator  which 
dominates  the  PTE.  This  consequence  is  not  surprised,  since  the  degree  of  evidence 
for  or  against  the  null  hypothesis  Hq  : = Q\s  not  considered  in  the  PTE's. 

The  EB  estimators  and  their  positive  part  versions  are  introduced  in  Section  2.2. 
Several  theorems  are  given  which  describe  the  frequentist  as  well  as  the  Bayesian 
properties  of  the  proposed  estimators.  Specifically,  it  is  shown  that  there  are  EB 
estimators  of  X^  which  always  dominate  X0  under  squared  error  loss.  In  Section 
2.3,  the  HB  estimators  is  introduced  which  also  serve  as  a compromise  between 
the  LSE's  based  on  full  and  reduced  models.  The  link  between  the  HB  EB 
estimators  is  also  pointed  out.  One  advantage  of  a HB  procedure  over  an 
procedure  is  that  the  former  can  be  also  for  the  construction  of  credible  sets  for 
X^.  Some  real  data  examples  are  given  in  Section  2.4  to  illustrate  the  estimation 
methods  discussed  in  Section  2.2  and  2.3.  Also,  in  two  simple  response  surface 

examples,  the  different  estimators  of  X^  are  compared  according  to  their  simulated 
frequentist  and  Bayes  risks. 

Some  of  the  results  of  Section  2.2  are  related  to  the  work  of  Ghosh,  Saleh  and 
Sen  (1987),  where  a class  of  estimators  of  is  proposed  compromising  the  LSE's 
based  on  the  full  and  reduced  models.  They  adopt  entirely  a classical  EB  approach 
whereas  the  present  thesis  introduces  and  relates  both  the  EB  and  HB  approaches. 

2-2  The  Empirical  Bayes  Approach 

In  order  to  motive  the  EB  approach,  a Bayes  estimator  of  has  to  be  found. 

A general  Bayes  model  for  the  present  situation  is 


Y\  Nr,[X0,a^V) 


(2.3) 


16 


and 

^~7Vp(^,E),  (2.4) 

where  = (uj',0^)  with  = p,  V is  known  positive  definite  (p.d.)  and  E 

is  p.d.  as  well.  The  structure  of  the  prior  is  motivated  from  the  null  hypothesis 
Ho  : 1^=0  mentioned  in  Section  2.1.  Instead  of  considering  the  above  general 
prior,  following  Ghosh,  Saleh  and  Sen  (1987),  Zellner’s  (1986)  g-prior  is  considered, 
where  E = V~^  X)~\  The  g-prior  has  been  justified  by  Zellner  (1986)  using 

Muth’s  (1961)  rational  expectations  hypothesis,  and  the  conceptual  idea  behind  the 
use  of  g-prior  is  introduced  in  Section  1.1. 

With  the  above  g-prior,  the  posterior  distribution  of  /?  | F is 

Np[u-^-  - B){§_  - u),o^[i  - B){X^  V_-^  X)-'^] 

where 


§_  = {x^Y_-^x)-^x^v_~^y 


is  the  BLUE  of  and  B = o^/{a^  + r^).  Writing  f 
estimator  of  under  any  quadratic  loss 


(^1 1^2)’  Bayes 


L(X^,  a)  = (a  - X^)^g(a  - X§),  (2.5) 

Q being  a known  p.d.  matrix,  is  given  by 

eB(F)  = E{X0  I Y_)  = Xi[ui  + (1  - B)(l^  - u,)]  + (1  _ B)X^l^.  (2.6) 

Suppose  the  parameters  and  are  unknown  and  need  to  be  estimated 

from  the  data.  There  are  two  ways  to  handle  this  situation.  One  is  the  EB  method, 
which  will  be  discussed  here,  where  ^nd  are  estimated  in  some  classical 

way  such  as  the  MLE%  UMVUE\  best  invariant  estimators  etc.  based  on  the 
marginal  distribution  of  Y.  The  other  is  the  so-called  HB  method,  discussed  in 
Section  2.3,  where  some  prior  distribution,  quite  often  improper,  on  the  unknown 


17 


parameters  is  provided,  and  inferences  is  made  based  on  the  resulting  posterior 
distribution. 

In  order  to  generate  the  EB  estimator,  first  note  that  marginally 

y ~ ^n{XiUi,a^V  + T^Px),  (2.7) 

where  = X{X^  V~^  X)~^X^ . Using  Exercise  2.9,  p.  33  of  Rao  (1973),  one  can 
write 

{a^V  + - (1  - B)V-'PxV-%  (2.8) 

where  one  may  recall  that  B = + r^).  The  BLUE  of  for  the  reduced 

model  F = + e is  I,  = (Xf  V~'  X,)-^XfV-  Y,  and  it  is  easy  to  see  that 

^1  ~ ^r  ~ ^12  (2.9) 

where  C,-,-  = Ff  F"'  X,  for  .=1,2  and  j=l,2. 

Using  Eqs.(2.8)  and  (2.9),  P^V-X,  = X.-  for  .=1,2,  and  the  idempotency  of 
^ after  some  algebra  one  gets 

(y  - X,  + r^£^)-(F  - X.  y,) 

= <^-"(y~X0  + xl-Xj^i-Xj^-X,ii,)^ 

(V-'  - (1  - B)v-‘p^v->)(y  - X^  + X|  - X,  + X,  - X, !.,) 

= S.-^ISSE  + SSB  + (I  - - u,)l,  (2.10) 

where  SSE  ~ (V  - X^)^K  ^{Y  - X^)  is  the  usual  error  sum  of  squares,  and 
SSH  - C22.1  with  Cjj.i  = Cjj  - C21  CCj^  (2i2  is  the  sum  of  squares  due  to  the 

hypothesis  Ho:  0,=  0.  As  in  Theorem  2 of  Ghosh,  Saleh  and  Sen  (1987),  it  is  now 
easy  to  see  that: 

. A 

SSH,  SSE)  is  complete  sufficient  for  and  a^. 

Since  SSH  ~ {r^+a^)xl„  UMVUE  o!a~^B  = (r^  + o^)->  is  {p,-2)/SSH 
for  p,  > 3.  The  best  scale  invariant  estimate  of  0=  is  SSE/{n  - p + 2),  where 


18 


P Pi  + P2s  S'lid  UMVUE  of  Ui  ^ Plugging  all  these  estimators  for  the 
unknown  parameters  in  (2.6),  the  EB  estimator  of  X/3  is  given  by 

‘eb(Y)  = X,\l  + (1  - -1)]  + (1  - (2.11) 

where  F = {SSH/SSE)(n  — p + 2),  a constant  multiple  of  the  usual  F ratio  for 
testing  Ho  : = 0. 

Due  to  the  nature  of  the  above  method  of  estimation,  B is  estimated  by  the 
quantity  (pj  - 2)/F  which  can  exceed  1.  For  practical  purposes,  instead  of  using 
the  EB  estimator  in  (2.11),  the  positive  part  EB  estimator 

= x,ii + (1  - - 1)\ + (1  - ^-^rxd,  (2.12) 

where  0+  = max(a,0),  should  be  used. 

Note  from  (2.12)  that  for  very  large  F values  indicating  substantial  departure 
from  Ho  : §_^  = 0,  is  very  close  to  X^,  whereas  for  very  small  F values  indicating 
enough  support  for  Ho,  e+g  is  very  close  to  When  there  is  no  clear-cut 

decision  for  or  against  Ho,  e^g  is  in  some  sense  a weighted  average  of  and 
Ki^^,  and  the  weights  being  adaptively  determined  by  the  data. 

Moreover,  if  the  design  matrix  X is  such  that  Cjj  = 0,  then  from  (2.9),  = 

Ceb  definded  in  (2.11)  simplifies  to  eEsiV)  = Xip^  + {1-  {p^  - 2)/F)X2P^. 
Correspondingly,  e^siY.)  also  simplifies. 

The  estimators  and  e%g{Y_)  are  evaluated  according  to  the  criteria  of  fre- 
quentist  and  Bayes  risks.  First,  the  frequentist  criterion  is  discussed. 

It  IS  important  to  note  that  Ceb  is  a member  of  a class  of  estimators  of  the  form 

UiY)  = KAl,  + (1  - -^1)]  + (1  - (2.13) 

The  following  theorem  provides  the  risk  expression  for  ^ under  the  loss  (2.5).  The 
risk  is  denoted  as  R{XP,e^). 


19 


Theorem  2.2.1.  If  is  a differentiable  function,  then  under  the  loss  (2.5), 


where  G = (Jf,  - Gr,'  C„fQ[X,  - Gr,‘  G,r) . 
Proof  of  Theorem  2.2.1.  Now,  using 


I.  = & - er.‘  e.r  I 


C-2’ 


(2.15) 


^{Y_)  can  be  written  as 

£*(a  = 2£.  + (1  - m/F){X,  - X,  Gr‘G,r)2r-  (2.16) 

Since  =0^+  Gri'Gijg.  and  and  one  writes 

+ ir  2,  = 2Ci  ££,,>(|J  + (X,  - XiCr,‘G„)2,.  (2.17) 

Now,  using  the  independence  of  0^  and  it  follows  from  (2.16)  and  (2.17)  that 
under  the  loss  (2.5), 


Ri2C0,u) 

= ■®2,»>{(&  - EiAl,})^{XiQXiU0^  - Ef,,,(0A 

= tr  {{XlQKi)V (|J}  + - 0^)} 

- (2.18) 
Write  + (X2  — Cij)^^-  It  follows  that  under  the  loss  (2.5), 

R{X0,XP)  = tr{[XlQ^X,)V{l)}  + -i.,rG{l^-^2)}  (2-19) 


20 


A generalization  of  Stein’s  identity  (Berger  and  Haff,  1983)  gives 

+ ^^)(|^G|j)/((ri-p  + 2)-‘S5£)} 


F2 


= (GC;,\)  + 2 - 25^) 


~T  ~ 


-}.  (2.20) 


^ ^2  —221^2 

Combining  (2.18)  (2.19)  and  (2.20),  one  gets  (2.14). 

The  following  corollary  to  Theorem  2.2.1  provides  sufficient  conditions  under 
which  dominates  X0.  Let  chi(M)  denote  the  largest  eigenvalue  of  a matrix  M. 

Corollary  2.2.1.  Suppose  (i)  tr  (GC^,\)  > 2ch,(GC-,\}-,  (ii)  ^(F)  is  nonde- 
creasing in  F and  0 < <^(F)  < 2 {{tr  {GC^2\)/ch^{GC;,\))-2}.  Then  F(X^,^)  < 

R{xp,xl). 


Proof  of  Corollary  2.2.1.  Following  Eqs.  (3.8)  and  (3.9)  of  Ghosh,  Saleh,  and 
Sen  (1987),  one  gets 


iAF)IF’)'£Gi^  < ch,{GC;,\)h^F)F{—  ” sf),  (2.21) 

n — p + 2 ' 

where  h{F)  = <j>{F)lF.  Now,  applying  (2.18)  of  Efron  and  Morris  (1976),  one  gets 


E{h\F)F- 


n — p 


4>^{F) 


n-p  + 2 ^ F n-pF2 

From  (2.21)  and  (2.22),  one  can  conclude  that 

E{<i>'^{F)/F^)f^Gl^  < ch,{GC:^l,)a^E{^^  - ^ 


n-  p + 2 


(2.22) 


4>{F]<i>'{F)}.  (2.23) 


Using  first  part  of  condition  (ii),  Eq.(2.14),  f^Gl^  < chx{GC:^l^){F:^^S^^},  and 
Eq.(2.23),  it  follows  that 


21 


R{x§_,^)  - R{xp,x~§) 


,{-2a^n^tr(GC;,\) 


n — p 
n p -j-  2 


< (GC-J  + 4^cA,(G£-,)  + ^ch,(SCn\)} 

- " %o'{-^|-2ir(GCjj',) +4c/ii(GCj;‘,)  + ,^(F)cA,(GC^, ',)]}.  (2.24) 


Now  using  condition  (i),  and  the  second  part  of  condition  (ii),  the  proof  is  completed. 

Consider  the  the  special  case  Q = V~\  Then  G = C22.1.  The  following  corollary 
is  now  immediate  from  Corollary  2.2.1. 


Corollary  2.2.2.  Suppose  Q = V~^  and  pj  > 3;  <^(F)  is  nondecreasing  in  F and 
0 < <^(F)  < 2(p2  - 2).  Then, 

R{Xp,e^)  < R[X0,XI). 

Corollary  2.2.1  and  2.2.2  are  useful  in  many  ways.  Note  the  under  the  loss 
o-^L{X§_,a),  changing  the  risk  only  by  a constant  multiplier,  X~§_  is  a constant  risk 
minimax  estimator  oi  X0.  Hence,  any  estimator  dominating  X0  is  also  a minimax 
estimator  of  Xp.  Thus  Corollary  2.2.1  and  2.2.2  provide  useful  minimax  estimators 
ofX0.  Since  is  a special  case  of  ^ with  (f>{F)  =p-2,it  follows  from  Corollary 
2.2.2  that  €eb  dominates  X^  and  is  a minimax  estimator  if  Q = V~^  and  p2  > 3. 
In  special  case  YL  = Ln,  §.eb  dominates  X^  under  squared  error  loss. 

It  is  also  clear  from  (2.14)  that  when  Q = and  cf>{F)  = c,  a positive  constant, 
then  R{Xl,^)  is  minimized  when  c = p2  - 2.  Thus,  for  Q = V~\  e^B  is  optimal 
within  the  class  of  estimators  of  the  form 

- li)}  + (1  - cF~^)X2  §_2- 

It  will  be  shown  later  the  Bayesian  optimality  of  Ceb  holds  within  the  same  class  of 
estimators  irrespective  of  any  p.d.  matrix  Q in  the  loss. 


22 


It  is  not  necessarily  true  though  that  Ceb  always  dominates  even  when 

g = V_-^.  Under  the  loss  (2.5)  with  Q = V~^  and  from  (2.9),  writing  (X^  P^-X/3) 

as  — — [X2  — X_i  C_ii  C_i2) §^2) i to  check  that  for  estimating 

X.  §j  Xi  has  risk 

= Pi<7^  +^2^22.1^2-  (2.25) 

Also,  it  is  easy  to  see  from  (2.14)  that  with  g = V_~^  and  <f){F)  = p2  ~ 2, 

R{Ki,eEB)  = pcr^  - 2{p2  - 2)  ^2(^-1)  + (p2  - 2yE0^^.{F~^SSH).  (2.26) 

Since  F = {n  - p + 2)SSH/SSE,  where  for  fixed  ^ and  <7^  SSH  and  SSE  are 
independently  distributed  with  SSH  ~ tr^x^jA),  A = ~ 

^^Xn-p)  it  follows  from  (2.26)  that 

R{Xp,eEB)  = po^  - a^p2  - 2)^{~~^  }E{p2  - 2 + 2K)~\  (2.27) 

Tl  p ^ ' 

where  K ~ Potsson(A).  Thus  from  (2.25)  and  (2.27),  as  A ^ 0,  R{XP,Xi0i) 
Picr^  yfhich  R{X0,eEB)  cr^(P“ (P2 -2)(n-p)/(n-p+2))  > picr^  since  pi+pj  = p. 
On  the  other  hand,  as  A ^ 00,  R{Xl,Xi0^)  ^ 00,  but  R{XP,eEB)  ^ po^. 
Hence,  neither  Xi  nor  Ceb  dominates  the  other.  The  above  phenomenon  is 
natural  to  be  expected.  Small  A signifies  0^  close  to  zero,  in  which  case  clearly 
is  the  desired  estimator.  On  the  other  hand,  large  A signifies  substantial 
departure  from  Hq  : ^^  = 0m  which  case  the  reduced  model  is  expected  to  perform 
poorly,  but  the  EB  estimator  turns  out  to  be  fairly  robust.  Note  that  X^^^  is  not 
a minimax  estimator  of  Xp,  and  since  R{Xp,Xj^)  ^ 00  as  A ^ 00,  from  the 
robustness  criterion,  Ceb  is  clearly  the  winner  over  Xp  or  Xj  p^  when  Q = V~\ 

The  EB  estimator  also  dominates  the  estimator  of  X^,  which  was  proposed 
by  Sclove  (1968),  in  the  frequentist  sense.  The  estimator  was  developed  under  the 
situation  that  at  least  pi  regression  coefficients  should  be  included  in  the  model 
and  are  willing  to  estimate  P2  > 3 more  coefficients  /?p,+i,  • • • ,/?p.  For  example,  in 


23 


polynomial  regression  one  may  believe  that  the  degree  of  the  polynomial  E{Yi)  is 
at  least  pi.  The  corresponding  estimator  is  of  the  form: 

^dov,  + j)K2  ^2,  (2.28) 

where  c is  any  number  between  0 and  2(pj  - 2)  and  F as  well  as  §_  are  defined  as 
before. 

Theorem  2.2.2.  Suppose  Q = K-'  and  > 3;  <^(F)  = c,  a positive  constant, 
and  0 < c < 2(p2  - 2).  Then, 


(2.29) 

Proof  of  Theorem  2.2.2.  Writing 

esdove  -2Cl=X0-X0-  jX2  ^2, 

then 


It  can  write  X§^  - X§_  in  the  cross-product  term  as  X\.0^  — E0^))  -f-  {X^  - 
X^iC_iiC_i2){^^  — use  the  independence  of  and  0^,  and  get 

EpMjixl-  x§)^v-^X2i,}  = Ri,Mjilc22Al2~i)}- 

A generalization  of  Stein’s  identity  (Berger  and  Haff,  1983)  gives 


= £'a,»»{<7'|(j)p2  + l-^)SSH/[n- p+2)-^SSE]} 


SSH- 


(2.31) 


From  (2.30),  one  gets 


24 


R{X.§_,§lS  clove) 


= po^ 


+ 


- a*(— — ?—)2c(p2  - 2)Eg„3{-^\ 

^-P  r^2^22^2. 


n — p + 2 

2 4 /■  ^ P 

= pa^+a*{ 


, J(-2c(p2 -2) } 

n-p  + 2 ^ ^SSH^ 


+ 


^ P ^ 2 T7I  f ^2 —21 —11^ —12^ 


C.2 


}■ 


(2.32) 


'n-p  + 2^'  55i/2 

Using  (2.14)  with  Q = F.  P2  > 3 and  (j){F)  = c,  it  can  obtain  R{XP,e^)=po^  + 
^ ^n-p+2)(~^^(P2  ~ 2)  + c^)  j|^}.  Thus,  the  proof  is  completed. 

However,  the  criticism  levelled  earlier  in  this  section  against  still  remains. 
This  estimator  is  obtained  by  estimating  B,  the  Bayes  shrinking  factor,  by  a random 
variable  which  can  assume  values  bigger  than  1 with  positive  (sometimes,  substan- 
tially bigger  than  zero)  probability.  As  mentioned  earlier,  this  deficiency  is  rectified 

by  S.EB-  Uor  every  estimator  defined  in  (2.16),  there  is  a corresponding  estimator 
given  by 


e-im = + (1  - -1^)}  + (1  _ 


*(F). 


(2.33) 


F ' '-1  -1'-’  ' ' jr 

It  is  natural  to  anticipate  that  e+  has  smaller  risk  than  4.  The  following  theorem 
provides  a rigorous  justification  of  this  intuition. 


Theorem  2.2.3.  Under  the  loss  (2.5), 


R{Xl,^)<R{Xp,^). 


(2.34) 


Proof  of  Theorem  2.2.3.  Note  that  since  (1  — ^^)"*"  is  differentiable  a.e.,  using 

4 (^)  = Xil^  + {1-  i^)  + (Xj  - Xj  C“/Ci2)^2,  ^ + Q-n  0.12^2' 

and,  by  the  independence  of  and  one  gets 


25 


+ ((1  - - 2 (1  - 

R{X.§_i^f,)  can  be  expressed  to  the  similar  form.  Then, 


= %,>{((! 

In  view  of  (2.35),  for  proving  the  result,  it  suffices  to  show  that 

E^_,AK^i,\SSH  = h,  SSE  = e,  3 ^ > 1}  > 0, 

F 


(2.35) 


(2.36) 


for  all  h > 0 and  e > 0.  However  0^,  SSH)  is  distributed  independently  of  SSE, 
and  F is  a multiple  of  SSH/SSE.  Hence,  it  suffices  to  show  that 


E0^2{P^G^^\SSH  = h}  > 0 Vh  > 0.  (2.37) 

Since  Cjj.i  is  p.d.,  there  exists  a nonsingular  K such  that  C22.1  = K^K-  Write 
— “ — ^2‘  SSH  = Zj X and  Z_  ~ N[K_§_2’><^^Lp^)-  One  can  write  0^GJ3^  = 

(— ^2^  — —^2  ~ ^ ^ (say),  ^ = K_  Now  using  an  orthogonal  transfor- 

mation U_  = PX,  where  the  first  row  of  P,  is  ||  ^ ||,  one  get  from  (2.37), 

Ep.a^{^Gl^\SSH  = h}  =11  0 II  Ep^^2{Ui  I II  U ||2=  h},  (2.38) 

where  Ui  is  the  first  element  of  C/  and  U_  ~ N{0,a^lp^),  6 = PKS^.  From 

Eq.(2.38),  arguing  as  in  Theorem  6.2  (pp  302-303)  of  Lehmann  (1983),  one  gets 
\ II  C/|P=h}  > 0 and  completes  the  proof  of  Theorem  2.2.3. 

Thus  §.£Q  has  smaller  risk  than  §_eb.  In  Section  2.3,  we  shall  find  certain  HB 
estimators  which  are  also  of  the  form  e^.  Although,  these  positive  part  estimators 


26 


being  non-smooth,  are  inadmissible  under  quadratic  loss,  no  explicit  estimators 
dominating  these  positive-part  estimators  are  available  as  yet.  In  any  case,  one 
does  not  expect  any  significant  savings  in  the  risk  over  the  positive  part  estimators. 

The  discussion  of  the  frequentist  risk  is  concluded  by  an  introduction  of  the 
PTE's.  A general  PTE  is  given  by 

§.PTE=  +X2g{F)~^^,  (2.39) 

where  g{F)  = I[F>d\,  d being  a positive  constant  depending  on  the  chosen  level  of 
significance,  and  I the  usual  indicator  function.  We  shall  see,  however,  that  the 
modified  EB  estimator 

+ (1  - + X2{^  - ^^^)g{F)l^,  (2.40) 

dominates  dpxE  under  certain  conditions.  Recall  the  definition  of  G in  Theorem 
2.2.1.  The  following  theorem  is  proved. 

Theorem  2.2.4.  Consider  the  loss  (2.5).  Then,  if  conditions  (i)  and  (ii)  of 
Corollary  2.2.1  hold  with  <t>Q{F)  replacing  <f>{F),  then, 

< F{X§jLpte)‘  (2-41) 

If  Q = r ‘ and  P2  > 3,  then  (2.41)  holds  under  the  conditions  of  Corollary  2.2.2 
with  <i>o{F)  replacing  <!){F). 

Proof  of  Theorem  2.2.4.  One  writes  where  4>i{F)  = F(l  - p(F)) 

and  where  <j>2{F)  = F [ 1 - (i  _ i^)g{F)  ] = MF)FMF)g{F).  Since 

g{F)  is  differenable  except  at  F = d and  g'{F)  = 0 a.e.,  4>\{F)  = (f>'.^{F)  -4>'o{F)g{F) 
a.e..  Also,  (^i(F)  - <f)2{F)  = -<t>o{F)g{F)  and  <j)\{F)  - <f)\{F)  = -2(t>o{F)4>i{F)g{F) 
-4>l{F)g^{F)  = -4>l{F)g{F)  (since  p(F)(l  - p(F))  = 0 and  g^{F)  = p(F)).  Now, 


27 


using  (2.14),  it  can  be  shown  that 


R[K§^,Spte)  - R[X§_,6_f^pQ) 

~T  ~ 

= -Ao^Ef_M-4:,(F),(F)S^^} 


(2.42) 


SSH 

-2<T=g,,„.{— (G£-\)  - 2£^j  } 

iT  L V 22.1)  If 

Since  <^o(R)  ^ 0 a.e.,  it  follows  from  (2.42)  that 

R{K^,6pte)  - R{K§^,^meb) 

> ) - 2|g4]  } 

- Ei.A^^s(F)'g^Gl^}. 

Arguing  as  in  Corollary  2.2.1,  one  gets  (2.41)  from  (2.43). 

In  the  particular  case  Q = V~^  so  that  G = it  follows  from  (2.43)  that 

R{Xi_,6p^E)-R[X^,S^EB) 

^ - r n — p ^2  ^ ’ 

Now  define  h{F)  = 4>o[F)IF  and  apply  (2.18)  of  Efron  and  Morris  (1976)  to  get 

Bl.,^{h\F)Fg{F)X^^  J 


(2.43) 


— a^Ep,j2{ 
+ 


n — p 


n — p + 2 
2 n — p 


n-  p + 2 

h'^{F)Fg{F) 


n-p+2  n- p+2 

.2  rr  fTl-  p -2 


S+h(F)h:(F)Fg(F)  + e(F)g(F))(-^±^)} 

n-p+2^e 


MF) 


- f.'ijp  r^~P-2  (f>UF)g(F)  4 

~ " — F ;rr7TiW«f)(^-^} 

■•^WgiF)  4 


n-p  + 2 


MF)g{F)cj)'^[F)} 


F 


(2.45) 


28 


Combining  Eqs.(2.44)  and  (2.45),  the  second  part  of  Theorem  2.2.4  is  therefore 


Note  that  6^^^  estimates  X0  by  if  g{F)  = 0,  i.e.,  one  accepts  the 

null  hypothesis  at  a desired  level  of  significance.  However,  in  the  event  the  null 
hypothesis  is  rejected,  one  estimats  by  a shrinkage  estimator  as  described  in 
the  earlier  part  of  this  section  instead  of  X^. 

The  above  theorem  provides  a class  of  estimators  improving  on  the  PTE.  How- 
ever, in  view  of  Theorem  2.2.3,  can  be  further  improved  by  a positive  part 

modified  EB  estimator  {§j^ebY  given  by 


Now,  the  Bayes  risk  optimality  of  is  discussed.  In  view  of  Theorem  2.2.1, 
the  Bayes  risk  of  Ceb  is  smaller  than  that  of  when  Q = V_~^  and  P2  > 3 ir- 
respective of  any  prior,  while  always  dominates  irrespective  of  any  prior. 
However,  the  Bayes  risk  optimality  of  Ceb  over  is  not  restricted  to  Q = V~\ 
The  following  theorem  shows  that  under  the  prior  described  at  the  beginning  of  this 
section,  the  Bayes  risk  of  Ceb  is  smaller  than  that  of  X^  irrespective  of  any  p.d.  Q. 

Theorem  2.2.5.  Consider  the  model  Y\p  ~ N{X§^,a^V),  where  V is  p.d.  and 
^ ~ N{u,  T {X  V_  X)  with  = (uf,0^).  Denote  the  given  prior  by  ^ and  let 
f(^,e)  denote  the  Bayes  risk  of  an  estimator  e of  X^  under  the  prior  ^ when  the 
loss  is  given  in  (2.5).  Then, 


proved. 


’•(e.tM)  < r((,Xl). 


(2.47) 


Proof  of  Theorem  2.2.5.  From  (2.6),  Ceb  — X^  can  be  expressed  as 


Hence,  one  gets 


F 


(2.48) 


29 


rU,eEB)  = E{{l-l)^X^QX{l-0) 

+i^^^(2-(|])'-X^gX(g-(|))}.  (2.49) 

Since  - i)^X^QX0  - 0)  = oHr  ( (X^V'^  X)~'X^QX\,  therefore,  nnder 

the  marginal  distribution  of  0,  one  gets 


(2.50) 

Next  note  that  0^,  0^  and  SSE  = {Y  - X~0fV~\Y  - X0)  are  mutually  in- 

dependent (see  Lemma  2 of  Ghosh,  Saleh  and  Sen,  1987)  and  marginally  0^  ~ 
and  0^  ~ N{0,cj^B-^C^,\),  and  SSE  ~ Recall  B = 

+ ’"^)-  Then,  by  conditioning  on  Y and  using  E{0\Y)  =0-  B{0  - i/)  with 
= (Rf,0^)  and  Eq.(2.9),  the  cross-term  in  (2.49)  divided  by  2 becomes 


E{^~0  - E{HY))'^X^QX  ( 1 


B E{— - ( —1  _Q.nQ-i2i_ 


T 

2 1 vT 


K^QX 


S.iiQ.12 

Lp. 


= BEiei^ilai} 

^ - 2)("  - 
n-  p+2 


Z.T 


t.2  ^22.1  §_2 

~T  - , 

Now,  0^C_22.i02  ^ function  of  the  complete  sufficient  statistic,  where 

~T  ~ ~T 

is  ancillary.  Hence,  using  Basu’s  Theorem, 

■£'(^2—^2)  “ -^(^2— 221^2) -^(^2— ^2/^2— 22.1^2)- 


1} 


(2.51) 


From  the  above  identity  and  the  fact  that  0^  ~ N{0,a^C^l^),  one  gets 

E-~f-^  = ^Cf^h)  ^ tr  {GC2I,) 

IW22.1I2  ^(^^22.1  ^2) 


(2.52) 


30 


Using  (2.9),  the  third  term  of  (2.49)  can  be  written  as 

^2  21^1  A Lp^)K^Q^2L\ 

^ ^AP2-mn-p)^  Sal 

(n-p  + 2) 

Finally,  using  the  similar  argument  of  Eq.(2.52), 

E{^M2  } 


-cr/c 


11  ^12 


-P2 


^2> 


ii2^22.lP,r 

3^G3  .T 

^^(^2^22.1^2)’^ 

l±2  —22.1  l±2 

rtr{GC7^A^  , , 

= {—  ^^■^^}a-^B{p2-2)-K 

P2 


Combine  (2.50),  (2.52)  and  (2.54)  to  get, 
rU,eEB)  = r{^,X3)-a^B- 


n-  p P2- 2 


(2.53) 


(2.54) 


'’■(2£i2‘i).  (2.55) 


n-p + 2 p2 

It  completes  the  proof  of  (2.47). 

The  next  theorem  provides  necessary  and  suffcient  conditions  under  which  the 
Bayes  risk  of  e^B  is  smaller  than  that  of  3^. 


Theorem  2.2.6.  Consider  the  set  of  Theorem  2.2.5. 

<r{i,Xi'3^),  (2.56) 

if  and  only  if  (l  - BY/B^  > 1 - {{n  - p)/{n  -p  + 2)}{{p,  - 2) /p^}. 

Proof  of  Theorem  2.2.6.  Using  the  identity  of  {Xi3^-X3)=Xi  0^  - £'^^2(^J). 
(4^2  ~ ^1^i/^12)^2’ 


R{X3,xJy  = <^^tr{[XlQX,)C-[l)  +3^G3,. 


(2.57) 


31 


Recall  p ~ N{u,t^{X^VT^X)-^)  with  Then 


+ (2.58) 

Since  tr{{XlQX^)C:[^)+tr{GC:^l^)  = tr  {{X^ QX){X^Y_-^X)-'^)  after  some  al- 
gebra. Now,  from  Eqs.(2.58)  and  (2.55), 


^j)  — ^(C}^eb)  — — <7^  + Bcr^ 


n - p p2  - 2, 
n-p  + 2 p2 


)tr  {GC;l,)  > 0,  (2.59) 


occurs  as  > l - B{n  - p){p2  - 2)/{n  - p + 2)(pj)  i.e.  (1  - 5)7(5^)  > 

1 - [(n  - p)/{n  -p  + 2)][(p2  - 2)/p2]. 

The  significance  of  the  above  theorem  is  very  clear.  If  then  5 < | 

in  which  case  (1  - B^/B^  > 1.  Thus  e^B  always  dominates  X^^^  in  the  Bayes 
risk  if  the  sampling  variability  is  smaller  than  the  prior  variability.  However,  if 
is  much  smaller  than  <r^,  then  the  reduced  model  seems  to  very  appropriate,  and 
— 1 §-i  becomes  a clear  winner. 

As  discussed  earlier  in  this  section,  one  can  obtain  a general  class  of  minimax 
estimates  oi  X0  oi  the  form 


e.(a  = X,  II,  + (1  - - I,)  I +x,(l  - ^)l,  (2.60) 

where  c is  an  appropriate  constant.  Under  the  same  prior  the  following  theorem 
shows  that  the  Bayes  risk  optimality  of  ^ over  enclave  when  Q = 


Theorem  2.2.7.  Under  the  loss 

L{Ki,a)  = {a-  X0)^V-\a-  XP), 

then 

< r(tes,io^,), 


where  0 < c < 2(p2  - 2). 


32 


Proof  of  Theorem  2.2.7.  From  the  proof  of  Theorem  2.2.5,  following  the  pro- 
cedure of  obtaining  r{^,eEB)  with  ^ replacing  and  replacing  Q one  gets 
^ = ^22.1-  Hence 


>2-2  n-p  + 2’ 

Using  the  result  of  in  Eq.(2.32),  and  Basu’s  theorem,  it  shows  that 


Ke,e5cw)  = r(e,e,)  + c^a^B~-  ^ 

n-p  + 2 P2(P2-2) 


(2.62) 


The  conclusion  of  the  theorem  follows. 

In  fact,  if  one  estimates  the  prior  parameters  by  some  other  method  than  what 
we  have  proposed,  one  can  get  EB  estimators  different  from  e^g.  However,  the 
following  theorem  shows  that  egg  enjoys  an  optimality  within  the  class  of  all  esti- 
mators of  the  form  ^(F),  i.e.  among  those  with  <f>{F)  = c for  all  F. 


Theorem  2.2.8.  Consider  the  same  set  up  as  of  Theorem  2.2.5.  Then, 

n-p  ,tr  (GC^l,) 


+ (- 


- 2c)<7*5(- 


& 


P2 


(2.63) 


P2  — 2 ’ 'n  — P + 2' 

For  p2  > 3,  the  above  risk  is  minimized  at  c = p2  — 2. 

Proof  of  Theorem  2.2.8.  With  c replacing  p2  - 2 in  the  procedure  of  obtaining 
the  proof  of  Theorem  2.2.5,  one  gets 


r(^,^)  = ahr  [ {X'^V-^X)-^{X^QX)  ] 

+ (--^_  ^ - 2c)g^H(— ” 7 ^ (^^22u) 


n - p + 2^ 


(2.64) 


And 


r(e,CB)  = tr{E{eg  - §){eg- pf{X^QX)} 

= tr{E[E{eg-l){eg-pf\Y]{X^QX)} 
= tr{V{§\Y){X^QX)} 

= B)tr[{X^V-^X)-\X^QX)]. 


(2.65) 


33 


Combine  (2.64)  and  (2.65)  to  get  (2.63).  Thus,  the  proof  of  Theorem  2.2.8  is 
completed. 

It  IS  also  possible  to  motivate  the  proposed  e^B  by  using  a prior  distribution  on 
cr^  instead  of  estimating  o"^  from  the  marginal  distribution  of  F.  Consider 

and  cr^  has  some  pdf  m{a^)  (say).  Under  this  prior,  the  posterior  mean  of  X(3  is 
still  identical  to  the  expression  given  in  (2.6).  Note  that  B = + a^yi)  = 

?/ (1  + q)-  It  follows  from  (2.7) 

[Y\o^)  ~ N^{X,u^,a\V+g-^Px)). 

Accordingly  SSH\a^  ~ while  SSE\o^  ~ Thus,  it  is  still  sensible 

to  estimate  B by  (pj  - 2)/F  for  more  appropriately  by  min(l,(p2  - 2)/i^).  The 
above  EB  estimator  maintains  its  frequentist  and  Bayes  risk  optimality  within  the 
class  of  estimators  ^ as  given  in  (2.60)  even  under  this  alternative  model. 

2-3  The  Hierarchical  Bayes  Approach 

As  discussed  in  Section  2.2,  as  an  atlernative  to  the  classical  EB  method,  one 
can  use  a HB  method,  where  one  put  a prior  distribution,  often  improper,  on 
the  unknown  hyperparameters,  and  arrive  at  the  posterior  distribution  of  /?  given 
F — y.  Inference  about  ^ or  is  then  based  on  this  posterior  distribution.  For 
point  estimation,  usually  both  these  approaches  stand  more  or  less  on  equal  footing. 
However,  unlike  the  EB  estimator  of  §_  or  X§_  which  has  no  natural  confidence  or 
credible  set  associated  with  it,  the  HB  method  provides,  at  least  with  reasonable 
approximation,  a credible  set  for  X^. 

In  this  section  a general  posterior  distribution  of  ^ given  F = y is  first  derived. 

In  the  special  case  of  the  g-prior  given  in  Section  2.2,  the  posterior  distribution 


34 


simplifies  considerbly.  In  that  case,  for  certain  improper  priors  on  and  there 
is  considerable  similarily  between  the  EB  and  HB  approaches.  Also,  some  of  the 
positive  part  EB  estimators  can  be  identified  as  HB  estimators,  where  one  uses  the 
posterior  mode  instead  of  the  posterior  mean  for  estimating  certain  parameters. 

In  order  to  provide  the  first  general  result  of  this  section,  first  write 
Consider  the  model  under  which  (F|^,ri)  ~ N{X§^,ri^V),  where  F is  a known 
p.d.  matrix.  Also,  let  (^|i/,r2)  ~ N{E,r^^D-^),  where  = (i/f,0^),  and  £ is 
a p.d.  matrix.  Finally  assume  that  Ui,  Ri  and  i?2  are  marginally  independently 
distributed  with  ~ uniform  on  the  pi -dimensional  Euclidean  space,  Ri  ~ 
Gamma(|ai,  |/i)  and  R2  ~ Gamma(|a2, 1/2).  Z is  said  to  have  a Gamma{a,^) 
distribution  if  Z has  pdf 

f{z)  = exp(-a2)z^-^a^/r(/?)/[^>o],  a > 0,  0 > 0. 

In  order  to  find  the  posterior  distribution  of  0,  recall  that  = 2CY.~^Kj 

= 152),  and  Cj22_i  = Q22  — C_2i  Q-22Q.12-  Also,  partition  D ss  D=  ( '] . 

V ^21=222  / 

Write  £22.1  = D.22  ~ n.21  D_22  D_i2-  Then  the  posterior  distribution  of  /?  given  Y_  = y 
is  as  follows: 


Theorem  2.3.1.  Conditional  on  F = y,  Ri  = ri  and  R^  = T2, 


N{ 


riCn 

^l0.2l 


^1^11^^12(^i£22.1  + ''2£22.i)  ^^22.1^2 
^i{^iQ.22.1  "h  ^2:^22. l)  ^^22.1^2 

^1^12  ^ 

^1^22  + 1"2D_22_\ 


(2.66) 


Also,  the  joint  posterior  distribution  of  R^  = ri  and  R2  = T2  given  F = y is  given 
by 


/(ri,r2|y)  « rf^"^  x + ^£221!-^ 

X exp(-^n55F;  - exp(-ir2^^(^-i,  + - io2r2)  (2.67) 


35 


Proof  of  Theorem  2.3.1.  Writing  C = X^V~^X,  the  joint  pdf  of  Y,  R, 

and  i?2  is 


f{y,0,  H,  ri.rj) 

oc  r”/'  exp[  -iri{(y  - XlfV_-\y  - + (|  - §_fC[l  - §)}  ] 

xrl'^  exp[  -^r2(^  - u)'^ D {§_- u)] 

^-1  1 h.-i  1 

xri^  exp(--airi)r2^  exp(--a2r2).  (2.68) 

Since  uf'  = (i/f  ,0^),  writing  D=[  — ^ 

V ^1  2.22  j ’ 

[0  - - 41) 


= (&  - - J^i)  + + 2(2.  - 41.)^C.;2, 

= (ill  - 2i  - Sii'Cii ^j)^Sii(ili  - 2i  - CnCii 2i)  + £2.22.1 2j-  (2-69) 
Now,  integrating  with  respect  to  it  follows  from  Eq.(2.68)  that 


oc 

X 


f[y,§_,  ri,T2) 

r"/'  exp[  -lri{5S’E  Y Q_-  0fC{0  - ^)}] 

r?’/^xp(-lr2^^£,,.,^^)rf^‘-^xp(-lair0r,-^^^-^exp(-ia2r2).(2J^ 


Now,  using  ""an  be  shown  that 


if  C{1- 0)  + T2f^D22.,i^ 

+ 2(^2  - ~ii+  ^12^2)  ] + £22.1^2 

= ri[  {0^  - 01^0100^  - 00  + 0^  0.22.102  ~ 2^r^22.1^2 

+ ‘^02  02001  - ^1)  ] + 0^  {ri022  + r2D22.\)02 

= ^A[0,-0,+A00^C,00^-0^  + A00] 

+ (^2  ~ 22  + ^2D.22,i){0^  - K00 

+^r00^-K0,fO200^-l+A00 

+02  (^1^22.1  - riA^OuA  - K^{riC22  + r2D22.i)K  + 2riK^ C2iA)0^,  (2.71) 


36 


where  A and  K satisfy 

= ~2(riC22  + ^2^22.1)^  + 2riC2iA. 

Solving  (2.72)  and  (2.73)  one  gets 

K = '•l(riC22.i+r2D22.i)-'C22.i, 

A = C~^C,2K. 

Using  (2.74)  and  (2.75),  part  of  the  last  term  in  Eq.(2.71)  can  be  expressed  as: 

^1(^22.!  - £C_x\A  + ^K^C^iA)  - K^{riC_22  + r2D22,i)K 
= ^iQ.22.1  - ^1^22.1(^22.1  + ~^22.l)~^C22.i 

= »-1^22.1  - ri(C22.i  + ^£22.1  - ^£22.i)(C22.i  + ^^22.l)“'C22.i 

= '■2£22.i(^22.1  + ~^22.i)~^£22  1 
^1 

= »’2£22.i(^22.i(£m\  + ~^22\)£22.i)"^U22.i 

= '‘2(£22\  + — ^22^)"^-  (2.76) 

It  follows  (2.70),  (2.71),  (2.74),  (2.75)  and  (2.76)  that  the  posterior  distribution  of 
^ given  y,  ri  and  r2  is  given  by  (2.66). 

Again,  following  (2.70),  (2.71),  (2.74),  (2.75),  (2.76)  and 

^1^11  ^iQ.12 

^iQ-21  ^1^22  + ^2^22.1 


(2.72) 

(2.73) 

(2.74) 

(2.75) 


- kl^llP'^*|riC22.i  +r2^22.l|^'^^ 

= 1^11 1 1^22.1  + ~^22l|) 

^1 

it  shows  the  joint  pdf  of  F,  R^,  and  R2  as: 


(2.77) 


/(y,  >•1,^2) 


37 


(2.78) 


This  leads  to  (2.67).  The  proof  of  Theorem  2.3.1  is  completed. 

It  is  possible  do  draw  inference  about  0 or  X0  based  on  the  above  posterior 
distribution.  One  can  obtain  expression  for  E{X0\Y  = y)  and  V{X/3\Y  = y)  which 
will  lead  to  point  estimators  as  well  as  credible  sets  for  X^.  In  the  special  case  of 
g-priors,  the  above  posterior  distribution,  however,  simplifies  considerably. 

To  see  this  note  that  for  the  g-prior  of  Section  2.2,  D = X^V_~^X.  Then,  writing 
B = R2/{Ri  + R2),  b = r2/(ri  + T2),  and  recalling  that  - Q_^i  QuP 2'> 

follows  from  (2.66)  that  conditional  on  F = y,  R^  = n and  R2  = r^, 


^11  S.12 

(1  - 6)1, 

_ G-21  ^22  + {b/i  - b)C_22,i 

Using  the  usual  inversion  formula  for  partitioned  matrices  (see  e.g..  Exercise  2.7, 
p.33  of  Rao,  1973),  one  gets. 


Also,  it  follows  from  (2.67)  that  under  the  g-prior,  the  joint  posterior  distribution 


-1 


(1  ^)Q.nQ.nQ.22.i 

(1  - b)C-2\ 


(2.80) 


of  Ri  and  R2  given  F = y is 


f{ri,r2\y) 


ri  + f2 


X exp[  -iri(SS£;  + a,  + -^SSH)  - ].  (2.81) 


38 


Since  B — Ri/{Ri  + i^j),  the  joint  posterior  distribution  of  Ri  and  B given  Y = y 
is 


firi,b\y)  oc 

1 — 6'^ 

r 1 / 

X exp[ -~ri{SSE  + + bSSH  + 026(1  - 6)-')  ].  (2.82) 

It  follows  from  (2.82)  that  the  posterior  distribution  of  B given  Y = y is 
f{b\y)  oc  b^P^-^j^)iB+^{SSE  + ai  + bSSH  + a2b{l-b)-Y-2^--P^+f^^B),  (2.83) 
Now,  from  (2.79),  since 

^^(X^IZ)  = Xi^^  + (X2-XiC-'Cj2)(1-^(5|F))^^.  (2.84) 

Also, 

V[E{X§}Y,B,Ri)\Y_]  = + (1  - B)[Xi  - ^iQ-iiQ.u)~P^\Y_\ 

= V{B\Y){X,  ~ X,Cl,^C,,)l£{X,  - ZiCr/C,2)^.  (2.85) 


E[V{XP\Y,B,R,)\Y]  = X 


^11 

^21  ^22 


(2.86) 


where 


s„  = -E(R:\i  ~ B)\Y)CliC,^Cll, 
s,,  = B)\Y)C^l^C,,Cli 

S,2  = E{R^'{i-B)\nc;,\ 


Using  the  formula 


y{XP]Y)  = E[V{X^\Y,B,R^)\Y]  + V\E{Xp\Y,B,Ri)\Y], 
one  can  find  V{X§_\Y_)  from  (2.85)  to  (2.87). 


(2.87) 


39 


It  follows  from  (2.82)  that 

£;(i?r'|6,y)  = ^SE  + <^i  + f>SSE  + a,b{l-b)-^ 

- n-p,  + f,  + f,-2  • 

The  calculations  for  (2.86)  can  now  be  completed  from  (2.83)  and  (2.88). 

To  compare  the  HB  estimator  given  in  (2.84)  with  the  EB  estimator  of  Section 

2.2,  consider  the  case  when  cj  =0,  /j  = -2  and  Ci  = 0.  Thus,  the  prior 

distribution  on  Ri  as  well  as  i?2  are  improper.  Then  the  posterior  distribution 

given  in  (2.66)  simplifies  to 


f{b\y)  oc  b2P^-^SSE  + bSSH)-2(^-P^+f^-^).  (2.89) 

From  (2.89),  one  gets 

E(BIY)  = i ^ 2zE±iL- (2  90) 

This  can  be  written  in  the  form  <?!>(F)/F  mentioned  in  Section  2.2.  Also,  instead 
of  taking  the  posterior  mean  of  B,  suppose  one  considers  the  posterior  mode.  The 
posterior  mode  or  the  Type  II  ML-estimator  of  5 as  obtained  from  (2.89)  is 

(2.91) 

if  P2  > 5.  Thus  the  estimator  +(1  - 5mo)(^,  - ] +^2(1  - Buojl^ 

is  a special  member  of  the  class  of  estimator  ofX/3. 

2.4  Numerical  Examples 


A • A^-P  + 2)(p2- 4)  . 

Bmo  = mtn{\ ;■  , '',!), 

{n-p  + fx  + 2)F  ’ 


This  section  illustrates  numerically  some  of  the  estimation  results  derived  in  the 
previous  sections.  Monte-Carlo  experiments  are  conducted  to  assess  the  frequentist 
as  well  as  the  Bayes  risks  of  the  different  estimators.  In  addition,  real  data  examples 
were  used  to  illustrate  the  applicability  of  the  proposed  methods. 

Two  situations  were  considered  in  the  Monte-Carlo  experiments.  The  first  in- 
volved a 2^  factorial  experiment  replicated  twice.  Suppose  experimenter  assumes  a 
response  surface  model  of  the  form. 


E{Yij)  - 7o  + 7lfl<  + 72^2.-  + 73^3.-  + 712^1, ■f2.'  + 7l3fl.f3.-  + 723<r2.f3.-  + 7l23Cl.f2.f3.', 


40 


where  i = !,•••, 8 (treatment)  and  j = 1,2  (replication),  fi,-  = 1 for  i = 1,2, 3, 4, 
corresponding  to  the  high  level,  and  = ~1  for  i = 5, 6,  7, 8,  corresponding  to 
the  low  level,  of  the  variable  (factor)  fj.  Also  ^2t  = 1 for  i - 1,2, 5, 6,  = -1 

for  i = 3, 4, 7, 8,  ^3,-  = 1 for  i = 1,3,5, 7 and  ^3,  = -1  for  i = 2, 4, 6, 8.  For 
§.\  ~ (70,71572573)  and  ^ = (•712,713572357123)5  then,  Pi  = P2  = 4 and  n = 16. 


The  design 

matrix 

X can 

be 

partitioned  as 

ix 

1,^2) 

, where 

’ 1 

1 

1 

1 

1 

1 

1 1 

1 

1 

1 

1 

1 

1 

1 

1 ■ 

1 

1 

1 

1 

1 

1 

1 1 

-1 

-1 

-1 

-1 

-1 

-1  - 

1 

-1 

1 

1 

1 

1 - 

1 

-1 

-1  -1 

1 

1 

1 

1 

-1 

-1  - 

1 

-1 

1 

1 - 

1 

-1 

1 

1 

-1  -1 

1 

1 

-1 

-1 

1 

1 - 

1 

-1  . 

■ 1 

1 

1 

1 - 

-1 

-1 

-1  -1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 ■ 

= 

1 

1 - 

-1 

-1 

1 

1 

-1  -1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

1 

1 - 

-1 

-1  - 

-1 

-1 

1 1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

. 1 

1 - 

-1 

-1  - 

-1 

-1 

1 1 

-1 

-1 

1 

1 

1 

1 - 

-1 

-1  . 

Note  that  Xj  X_ 

= 

16  X Tg 

. It 

is  assumed  that  V_ 

n 

Zie  and  Q 

= Li& 

here. 

The  second  example  a 3*  factorial  experiment,  involved  two  factors  each  at  three 
levels.  This  is  taken  from  Myers  (1976).  It  assumes  a respone  surface  model  of  the 
form 


E{Y_i)  - 7o  + 7iCl<  + 72^2.'  + 7llfl<  + l22^li  + 7l2fl.C2i, 

i = !,•••, 9,  where  fi,-  = —1  for  i = 1,2,3,  corresponding  to  the  low  level,  fi,-  = 
0 for  i = 4,5,6,  corresponding  to  the  central  lavel,  and  fi,  = 1 for  i = 7,8,9, 
corresponding  to  the  high  level,  of  the  variable  fi.  Also,  ^2»  = — 1 for  t = 1, 4,  7;  ^2.-  = 
0 for  1 — 2,5  and  8,  and  f2<  = 1 for  1 = 3,6  and  9.  Here  (3^  = (70,71,72),  comprises 
the  general  effect  and  the  two  main  effects,  while  ^ = (711,722,712),  comprises  the 
second  order  interactions.  Thus  Pi  = P2  — 3 and  n = 9.  One  can  partition  the 
design  matrix  X as  X = (Xi,X2),  where 


1 

1 

1 

1 

1 

1 

1 

1 

1 

II 

-1 

-1 

-1 

0 

0 

0 

1 

1 

1 

-1 

0 

1 

-1 

0 

1 

-1 

0 

1 

1 

1 

1 

0 

0 

0 

1 

1 

1 

Xl- 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

0 

-1 

0 

0 

0 

-1 

0 

1 

41 


Table  2.1:  The  estimated  frequentist  risk  results  for  3*  factorial  experiment. 


The  estimator 
of  X0 

d 

(3  0 1 1 2 3) 

(3  0 1 0 0 0) 

X0 

5.99478 

6.03603 

48.9807 

2.97617 

^EB  (il) 

5.99216 

5.32031 

^EB  (F) 

5.99216 

5.16984 

?lHB  (H) 

5.99877 

5.33225 

^PTE.loiY.) 

7.81844 

3.60394 

^MEB. 10  (H) 

7.79382 

3.57963 

EB. 10  iX] 

7.79382 

3.57963 

^PTE.Oh  (Z!) 

13.8542 

3.0364 

^MEB.Ol  (Z) 

13.8169 

3.29718 

,05  00 

13.8169 

3.29718 

^PTE. 01  (Z) 

34.2084 

3.10328 

01(H) 

34.1966 

3.10220 

^MEB.OliY.) 

34.1966 

3.10220 

Also,  - Zg  and  g = /g  are  assumed  in  this  example.  Here  = Z>fa^(9, 6, 6), 


C_22  — 


6 4 0 
4 6 0 
0 0 4 


andCj2  = 


660 

000 

000 


, from  which,  one  computes  C22.1  = Diag{2, 2, 4). 
For  a given  0,  a random  vector  Y„  was  generated  from  N{X0,I„)  in  each 
replication.  The  sampling  procedure  was  repeated  1,000  times.  For  estimating  X/3, 
the  estimated  frequentist  risks,  i.e.,  the  average  squared  error  losses  of  the  least 
squares  estimators  {LSE,  full  model  and  RISE,  reduced  model),  the  empirical 
Bayes  estimator  ( eEsiY)),  the  positive  part  empirical  Bayes  estimator  (4b(1:)), 
the  hierarchical  Bayes  estimator  {effsiY),  as  given  in  (2.90)  with  /i  = 2),  the 
preliminary  test  estimators  being  the  significant  level,  = .10,  .05, 

.01),  the  modified  empirical  Bayes  estimators  (eM£njr)’s)  and  the  positive  part 
modified  empirical  Bayes  estimators  (e+^^,(F)’s)  are  compared.  The  results  for 

the  32  experiment  are  given  in  Table  2.1,  while  the  results  for  the  2^  experiment  are 
given  in  Table  2.2. 


42 


Table  2.2:  The  estimated  frequentist  risk  results  for  2^  factorial  experiment  repli- 
cated two  times. 


The  estimator 
oiX§_ 

/ 

1 

(-1  4 0 3 2 1 5 -2) 

(-1  4 0 3 0 0 0 0) 

7.77952 

7.92404 

547.88700 

3.93829 

(iO 

7.77666 

6.17508 

^EB  (Y.) 

7.77666 

5.83407 

^HB  (H) 

7.78255 

6.18972 

^PTE. 10  (H) 

7.77952 

4.81253 

EB. 10  (Y.) 

7.77666 

4.65800 

^tiEB. 10  (il) 

7.77666 

4.65800 

^PT E. 05  {Y) 

7.77952 

4.47572 

^MEBmiY.) 

7.77666 

4.39852 

^tfEB.OS  (Y.) 

7.77666 

4.39852 

^PTE. 01  (il) 

7.77952 

4.05713 

IWBB.Oldl) 

7.77666 

4.04630 

^MBB.Ol(H) 

7.77666 

4.04630 

* Note:  For  ^ = (-1403215  - 2),  during  the  1,000  replications, 
the  null  hypothesis,  was  rejected  in  every  replication 

for  all  a =.10,  .05  and  .01.  Hence,  eprp^^(V)  = LSE 
^-Mes.{Y)  = e,p[Y)  and  = e%p{Y).  ’ 


43 


When  = 0,  the  LSE  of  under  the  reduced  model,  is  the  best 

estimator  of  Xp.  However,  substantial  improvement  in  the  estimated  frequentist 
risk  of  over  X^,  the  full  model  LSE,  was  noticed.  A similar  phenomenon 
occured  with  the  hierarchical  Bayes  estimator.  Other  HB  estimators  with  A = 1 
and  4 were  tried.  The  results  are  not  reported,  because  they  are  very  similar  to  the 
case  /j  = 2.  Note  also  that  the  estimated  frequentist  risks  of  the  PTE's,  the  MEB's 
and  the  MZ’H+’s  are  all  decreasing  with  increase  in  the  significant  level.  This  is 
because  the  larger  the  critical  value,  the  less  is  the  chance  to  reject  0. 

When  ^ 0,  or  e^g  are  the  estimators  having  the  smallest  frequentist 
risks.  Admittedly,  the  risk  improvement  is  not  significant  over  the  full  model  LSE. 
However,  there  is  considerable  risk  improvement  over  the  reduced  model  LSE, 
as  well  as  the  different  PTE's,  and  the  modified  e^^’s  or  e+g's.  In  this  case, 
the  ordering  of  the  estimated  frequentist  risks  of  the  PTE's,  the  MEB's  and  the 

MEB+'s  IS  reversed.  This  can  be  accounted  for  by  the  same  argument  given  at  the 
end  of  the  previous  paragragh. 

Next  the  Bayes  risks  ware  considered.  For  each  replication,  p was  generated 
from  N{u,t^X^X)-^)  for  a given  i/  and  t\  For  a generated  P and  a given  a^,  a 
random  vector  was  then  generated  from  N{Xp,a^I„).  Once  again,  the  sampling 
procedure  was  repeated  1,000  times.  The  average,  over  1,000  replications,  squared 
error  losses  of  the  different  estimators  of  XP  ware  compared.  Tables  2.3  and  2.4 
display  the  estimated  Bayes  risks  of  the  different  estimators. 

When  <a\  i.e.  the  prior  variability  is  smaller  than  the  sampling  variability, 
the  estimated  Bayes  risk  of  the  different  estimator  maintain  the  same  order  as  the 
corresponding  estimated  frequentist  risks.  The  case  = 0(^2  7^  Q)  corresponds 
to  the  case  P^  = 0 {p^  ^ 0).  However,  when  > a\  even  though  i/j  = 0,  the 
reduced  model  LSE,  the  PTE's  and  the  modified  or  e+g's  all  perform  poorly 
as  compared  to  the  full  model  LSE,  Cgg  or  ejgg.  The  reason  is  that  increased 


44 


Table  2.3:  The  estimated  Bayes  risk  results  for  3^  factorial  experiment. 


The 

U 

(2  3 4 1 1 1) 

(2  3 4 0 0 0) 

estimator 

1 

1 

of  X/5 

4 

1 

.25 

4 

1 

.25 

5.95497 

5.95497 

5.95497 

5.95497 

5.95497 

5.95497 

—1  ^1 

22.8549 

13.9354 

11.7289 

15.0423 

6.02910 

3.77581 

^EB  (11) 

5.89124 

5.87080 

5.88644 

5.82264 

5.60552 

5.40601 

^EB  (H) 

5.87863 

5.86034 

5.87390 

5.79278 

5.53087 

5.27738 

^HB  (H) 

5.80484 

5.73912 

5.72836 

5.69700 

5.43529 

5.26267 

10  (H) 

9.04118 

9.47173 

9.25189 

8.87426 

6.06646 

4.31691 

^meb.io{Y^ 

8.97500 

9.40509 

9.18729 

8.81221 

6.02341 

4.28660 

^AfPB.lodl) 

8.97500 

9.40509 

9.18729 

8.81221 

6.02341 

4.28660 

^PTE. 05  (H) 

12.4347 

11.0010 

10.2346 

10.5683 

6.06056 

4.07614 

^EB. 05  (H) 

12.3974 

10.9697 

10.2076 

10.5347 

6.04193 

4.06642 

^ATPB.Osdl) 

12.3974 

10.9796 

10.2076 

10.5347 

6.04193 

4.06642 

^PTE. 01  (H) 

19.0021 

13.0864 

11.3694 

13.6495 

6.03266 

3.84518 

^MEB.OliY.) 

18.9955 

13.0824 

11.3662 

13.6454 

6.03266 

3.84450 

^MEB. 0i(Y) 

18.9955 

13.0824 

11.3662 

13.6454 

6.14803 

3.84450  1 

prior  variability  overshadows  the  credibility  of  the  prior  mean.  When  the 

EB  or  the  postive  part  EB  estimator  dominate  both  the  LSE  or  the  RISE.  The 
domination  is  much  more  pronounced  in  the  case  when  indeed  equals  to  0. 

Next,  we  investigate  how  these  estimators  perfrom  on  some  real  data.  Professor 
James  Barret  and  Terril  Nell  of  the  University  of  Florida  investigated  the  effects  of 
growth  retardant  chemicals  at  various  rates  of  application  on  heights  of  chrysan- 
themum plants  (personal  communication).  Their  experiment  was  a randomized 
complete  block  design  with  five  blocks.  Two  chemicals  were  each  applied  at  six 
rates,  given  2x6  factorial  treatment  structure.  The  12  treatments  were  applied  to 
two  plots  in  each  block.  The  estimation  methods  discussed  in  this  paper  will  be 
illustrated  using  data  from  two  of  the  five  blocks,  block  one  and  block  five.  Seperate 
illustrations  are  presented  for  blocks  1 and  5 because  quite  different  assessments  of 
chemicalxrate  interaction  are  revealed  in  the  two  blocks.  Data  are  presented  in 


45 


Table  2.4:  The  estimated  Bayes  risk  results  for  2"  factorial  experiment  replicated 
two  times. 


The 

estimator 
of  X0 

(-4  6 8 3 5 2 7 4) 

(-4  6 8 3 0 0 0 0) 

4 

4 

9 

4 

1 

9 

4 

1 

X0 

31.7548 

31.7548 

31.7548 

31.7548 

31.7548 

31.7548 

1556.05 

1536.09 

1524.16 

52.3833 

32.3134 

20.2714 

^EB  (il) 

31.7530 

31.7526 

31.7522 

30.4188 

29.5948 

27.1787 

^EB  (H) 

31.7530 

31.7526 

31.7522 

29.6207 

27.8240 

25.1256 

^HB  (il) 

31.7892 

31.7895 

31.7896 

28.9510 

27.3352 

25.4736 

^PTE.W  (il) 

31.7548 

31.7548 

31.7548 

37.1159 

32.1043 

23.1223 

^Mf:B.io(il) 

31.7530 

31.7526 

31.7522 

35.7726 

31.0050 

22.4638 

^AifBB.lo(il) 

31.7530 

31.7526 

31.7522 

35.7726 

31.0050 

22.4638 

^PTE. 05  (X) 

31.7548 

31.7548 

31.7548 

40.5139 

32.1927 

21.9560 

^MEB. 05  (il) 

31.7530 

31.7526 

31.7522 

39.6051 

31.6586 

21.6440 

^MEB. 05  (X.) 

31.7530 

31.7526 

31.7522 

39.6051 

31.6586 

21.6440 

s.PTE.01  (y) 

31.7548 

31.7548 

31.7548 

47.0679 

32.2514 

20.6613 

&MEB. 01  (y) 

31.7530 

31.7526 

31.7522 

46.7918 

32.1120 

20.6191 

^MEB. 01  (yj 

31.7530  1 

31.7526 

31.7522 

46.7918 

32.1120 

20.6191 

Note:  For  v_  = (-4  6 8 3 5 2 74),  during  the  1,000  replications, 
the  null  hypothesis,  = Q,  was  rejected  in  every  replication 
for  all  a =.10,  .05  and  .01.  Hence,  epT^^(Y)  = LSE, 
^EBa{y)  = ^b(Z)  and  = 4b(1:). 


46 


Table  2.5:  Six-week  heights  of  chrysanthemum  plants  in  pots  treated  with  two 
growth  retardant  chemicals  at  six  rates  of  application. 


Block  1 

Chemical 

Rate 

5 

10 

20 

40 

80 

160 

1 

55.0 

47.0 

52.5 

46.5 

47.0 

37.0 

57.5 

54.0 

51.0 

44.5 

41.0 

48.0 

2 

54.0 

47.5 

38.0 

44.0 

25.0 

4.50 

51.0 

53.5 

44.5 

26.5 

23.5 

21.5 

Block  5 


Chemical 

Rate 

5 10  20  40  80  160 

1 

57.5  50.5  50.0  47.5  35.5  20.0 

48.0  55.0  53.0  32.5  40.0  32.0 

2 

45.0  52.5  32.0  43.0  31.0  26.0 

56.0  50.0  45.0  37.0  28.0  22.0 

Table  2.5.  The  continuous  nature  of  the  variable  rate  is  not  incorporated  in  the 
results  to  be  reported  in  this  paper,  because  we  use  the  example  to  illustrate  be- 
havior of  the  estimators  rather  than  to  illustrate  how  the  data  should  be  analyzed 
for  publication  in  the  horticultural  literature. 

The  model  assumed  for  a given  block  is 


Vijk  = /i  + a,-  + + 6ij  + tijk,  (2.92) 

where  y.y*  is  the  six  week  height  of  the  A:th  plant  treated  with  rate  j of  chem- 
ical i.  We  parameterize  the  model  with  E.a,-  = = E,(J,y  = Ey<5,y  = 0 

thereby  eliminating  any  parameter  with  subscript  i = 2 or  j = 6.  Thus  0^  = 
(M,ai,7i,72,-73,74,75),  = {Sn,Su,6n,Su,Sn)  and  0^  = 


47 


In  matrix  notation,  the  model  is 


yiii 

1 1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

fill 

I/m 

1 1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 *> 

1/121 

1 1 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

''iiA 

€191 

1/122 

1 1 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1*1 

6j22 

1/131 

1 1 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

6131 

1/132 

1 1 

0 

0 

1 

0 

0 

0 

0 

1 

0 

0 

6132 

yi41 

1 1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

At 

fl41 

VU2 

1 1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

«1 

fl42 

yisi 

1 1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

7i 

fl51 

yi52 

1 1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

72 

fl52 

yi6i 

1 1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

73 

fl61 

yi62 

— 

1 1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

74 

6162 

y2ii 

1 -1 

1 

0 

0 

0 

0 

-1 

0 

0 

0 

0 

75 

f211 

y2i2 

1 -1 

1 

0 

0 

0 

0 

-1 

0 

0 

0 

0 

Sn 

f212 

V221 

1 -1 

0 

1 

0 

0 

0 

0 

-1 

0 

0 

0 

Su 

f221 

V222 

1 -1 

0 

1 

0 

0 

0 

0 

-1 

0 

0 

0 

<^13 

6222 

y23i 

1 -1 

0 

0 

1 

0 

0 

0 

0 

-1 

0 

0 

^14 

^231 

V232 

1 -1 

0 

0 

1 

0 

0 

0 

0 

-1 

0 

0 

Sis  . 

f232 

y24i 

1 -1 

0 

0 

0 

1 

0 

0 

0 

0 

-1 

0 

f J t 

V242 

1 -1 

0 

0 

0 

1 

0 

0 

0 

0 

-1 

0 

'’^41 

y251 

1 -1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

-1 

V252 

1 -1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

-1 

f oco 

V261 

1 -1 

-1 

-1  - 

-1 

-1 

-1 

1 

1 

1 

1 

1 

V262 

• 

1 -1 

-1 

-1  - 

-1  - 

-1 

-1 

1 

1 

1 

1 

1 

'■4D1 

C262 

(2.93) 

The  null  hypothesis 

Ho 

= 

Q specifies 

no 

interaction  between  chemical 

and  rate.  Table  2.6  contains  F statistics  and  observed  significance  probabilities 
for  ifo  : ^2  = 0 for  each  block.  Also  included  in  Table  2.6  are  the  sums  of  squared 
residuals  from  the  LSE  for  the  full  model  [SSElse],  the  LSE  for  the  reduced  model 
{SSErise),  the  EB  estimator  [SSEeb]  and  the  hierarchical  Bayes  estimator  with 
/i  = 2 {SSEhb).  The  Euclidean  distances  between  the  EB  or  the  HB  estimators 
from  the  least  squares  estimators  for  the  full  as  well  as  the  reduced  models  are 
also  provided.  These  distances  are  noted  as  Dj^se,eb,  Dblse,eb,  D.se.hb  and 
Drlse.hb,  {Dlse.eb  =11  Xl-eEsOO  etc.).  The  test  statistic  for  interaction 
[F  = 3.10,  p = 0.0503)  in  block  one  is  essentially  significant  at  the  .05  level  (Fqs  = 
3.11),  but  the  test  statistic  for  interaction  {F  = .70,  p = .6318)  in  block  five  is 


48 


Table  2.6:  Sum  of  squared  errors  and  the  Euclidean  distances  for  different  estima- 
tors. 


F for  Ho  : ^2  = 0 

Block  1 Block  5 

3.10(p  = 0.0530)  0.70(p  = 0.6318) 

S S Else 
SSErlse 
SS  Eeb 

SS  Ejjb 
Dlse,eb 
Drlse,eb 
Dlse,hb 

Erlse,hb  \ 

451.6  433.0 

1034.8  559.9 

467.7  500.9 

471.7  458.2 

16.10  67.90 

405.7  9.20 

20.10  25.20 

386.7  39.0 

not  significant  at  any  meaningful  level.  Correspondingly,  in  block  one  the  sum  of 
squared  residuals  for  the  reduced  model  (1034.8)  is  much  larger  than  the  sum  of 
squared  residuals  for  the  full  model  (451.6),  and  in  block  five  the  sum  of  squares  for 
the  reduced  model  (559.9)  is  only  nominally  larger  than  the  sum  of  squared  residuals 
for  the  full  model  (433.0).  But  in  both  blocks,  the  sum  of  squared  residuals  for  the 
EB  estimator  (467.7  in  block  one  and  500.9  in  block  five)  is  not  much  larger  than 
the  sum  of  squared  residuals  for  the  complete  model.  For  the  HB  estimator  with 

h = 2,  the  sum  of  squared  residuals  (471.7  in  block  one  and  458.2  in  block  five) 
behaves  similarly  as  the  EB  estimator. 

The  Euclidean  distances  in  block  one  {Dise.bb  = 16.1,  Delsebb  = 405.7, 
Dlse.hb  = 20.1  and  Delse.hb  = 386.7)  shows  that  the  distances  between  the 
EB  or  the  HB  estimators  from  the  LSE  for  the  full  model  is  much  less  than  the 
distances  between  the  the  EB  or  the  HB  estimators  from  the  LSE  for  the  reduced 
model  while  interaction  is  present. 

But  m block  five  the  Euclidean  distances  [Dise,eb  = 67.9,  Drlse,eb  = 9.2, 
E>lse,hb  = 25.2  and  Drlse,hb  = 39.0)  between  the  EB  estimator  from  the  LSE 


49 


for  the  reduced  model  is  less  than  the  distance  between  the  EB  estimator  from  the 
LSE  for  the  full  model  while  the  interaction  is  insignificant.  But  surprisingly,  the 
HB  estimator  is  still  closer  to  the  full  model  LSE  than  the  reduced  model  LSE. 

Predictive  values  for  the  full  model  (Z^),  for  the  reduced  model  {X^  for  the 
EB  estimator  €eb{Y)  and  for  the  HB  estimator  e„B{Y)  are  presented  in  Table  2.7 
and  plotted  in  Figure  2.1.  In  block  one,  wherein  interaction  is  present,  the  EB  and 
the  HB  estimates  differ  relatively  little  from  the  LSHs  for  the  full  model.  But  in 

block  five,  wherein  interaction  is  insignificant,  the  EB  estimates  are  close  to  the 
LS E's  for  the  reduced  model. 

Results  for  the  preliminary  test  estimators,  ^teAY),  ^TEm{Y),  and 
^PTEm{Y)  are  not  shown  explicitly  in  Table  2.7  because  their  values  are  equal  to 
either  the  LSE  or  the  RLSE  Z,  depending  on  the  level  of  significance 
of  the  F test  for  interaction.  In  block  five  the  F test  was  not  significnat  at  any 
meaningful  level;  so  all  the  preliminary  test  estimators  are  equal  to  the  LSE  for 
the  reduced  model.  In  block  one,  the  test  for  interaction  {F  = 3.10)  fell  short  of 
significance  at  the  .05  level  (Fqs  = 3.11)  by  a very  narrow  margin.  Thus  ^TEm{Y) 

^PTE.oiiY)  are  both  equal  to  the  RLSE  in  block  one,  but  epTE.io{Y)  is  equal  to 
LSE  for  the  full  model.  This  illustrates  a fallacy  in  the  PT  estimators,  especially 
when  the  a is  set  at  a relatively  low  value  such  as  .05  or  .01.  Interaction  between 
chemical  and  rate  is  evidently  present  in  block  one,  although  the  F test  very  slightly 
fails  to  detect  it  at  the  .05  (or  smaller)  level.  This  leads  to  values  of  ep^^  osCr)  and 
^pte.oi{Y)  that  are  inferior  to  the  LSE  because  of  the  apparent  bias.  In  block  one, 
^TE.osiY)  has  values  that  are  very  different  from  values  that  would  be  obtained 

if  F were  only  slightly  larger.  The  EB  or  HB  estimators  do  not  suffer  from  this 
drawback. 


50 


Table  2.7:  Predicted  values  for  different  estimators. 


Block  1 


Chem 

Rate 

Kl 

^EB  (H) 

^HB  (H) 

1 

5 

56.25 

60.25 

56.96 

57.04 

1 

10 

50.50 

56.65 

51.52 

51.64 

1 

20 

51.75 

52.65 

51.90 

51.92 

1 

40 

45.50 

46.52 

45.67 

45.69 

1 

80 

44.00 

40.27 

43.38 

43.31 

1 

160 

42.50 

33.90 

41.07 

40.90 

2 

5 

52.50 

48.23 

51.79 

51.71 

2 

10 

50.50 

44.35 

49.48 

49.36 

2 

20 

41.25 

40.35 

41.10 

41.08 

2 

40 

35.25 

34.23 

35.08 

35.06 

2 

80 

24.25 

27.98 

24.87 

24.94 

2 

160 

13.00 

21.60 

14.43 

14.60 

Block  5 


Chem 

Rate 

sbb(y) 

1 

5 

52.75  53.88  53.57  53.25 

1 

10 

52.75  54.25  53.85  53.42 

1 

20 

51.50  47.25  43.89  49.16 

1 

40 

40.00  42.25  41.65  41.00 

1 

80 

37.75  35.88  36.38  36.91 

1 

160 

26.00  27.25  26.91  26.56 

2 

5 

50.50  49.38  49.68  50.00 

2 

10 

51.25  49.75  50.15  50.58 

2 

20 

38.50  42.75  41.61  40.39 

2 

40 

40.00  37.75  38.35  39.00 

2 

80 

29.50  31.38  30.87  30.36 

2 

160 

24.00  22.75  23.09  23.44 

51 


block  1:  interaction,  and  block  5:  no  interaction. 


CHAPTER  3 

EMPIRICAL  BAYES  ESTIMATION  OF  MULTIVARIATE  REGRESSION 

MODEL  I 

3.1  Introduction 

The  linear  model  for  m distinct  individuals  is  often  useful  in  growth  and  response 
curve  studies.  If  it  is  suspected  that  there  are  no  interaction  effects  involved,  then, 
there  may  be  a choice  between  the  following  two  models: 

Yjc  = + ^2^*2  + (3.1) 

YLk  = + (3  2) 

where  Tj,  is  a n*  x 1 vector  of  responses  of  the  A:th  individual  over  the  n*  different 
occasions,  X.’s  are  the  n*  x p,-  design  matrices  of  full  rank  p,  < n*  for  i = 1,2. 

is  the  Pi  X 1 vector  of  main  effects  coefficients,  and  is  the  p2  x 1 vector  of 
interaction  coefficients  for  the  kth  individual.  The  vectors  e*’s  are  independently 
distributed  as  A^(0,cr2]^),  where  V^’s  are  the  known  p.d.  matrices.  However,  the 
above  the  analysis  follows  trivially  from  Chapter  2.  If  in  addition  the  main  effects 

are  suspected  to  be  all  equal,  then,  there  may  be  a choice  between  the  model  (3.1) 
and  the  following  one: 

Yji  = + §.k-  (3.3) 

To  determine  the  choice  between  the  model  (3.1)  and  the  model  (3.3),  a classical 
way  is  testing  Ho  : and  = 0 for  all  k.  The  model  (3.1)  is  utilized 

if  the  null  hypothesis  is  rejected  at  a desired  level  of  significance  (for  example, 
a = .05),  and  the  model  (3.3)  is  chosen  when  the  opposite  is  true.  However, 
the  above  procedure  suffers  from  the  drawback  that  there  is  no  general  way  of 


52 


53 


incorporating  the  degree  of  evidence  for  or  against  the  null  hypothesis  in  order  to 


arrive  at  the  estimators. 

For  the  problem  of  best  fitting  of  a given  set  of  data  by  the  least  squares  methods, 
the  model  (3.1)  is  always  superior  to  the  model  (3.3)  i.e.  ||  F*  - X^.p  ||2 

< Er=i  II  r*  - ||2.  Note  that  ^ V*  denotes  the 

LSE  oi§_^  based  on  the  model  (3.1)  and  = (Er=i  Er=i  with 

Sjcii  - KkiYjc  Kki  and  = C^iiXhVj;^Yi,.  However,  when  estimators  of  (Xj3)* 
— ((Xj^J  , • • • , are  compared  on  the  basis  of  general  quadratic  loss, 

m 

^(Xi^P  • • • fli,  • • • , a„,)  = ^(^  - X,pfQ^{a^  - X,0^),  (3.4) 


t=i 


where  Q^’s  are  some  known  p.d.  matrix,  the  LSE  of  (^)*  based  on  the  model 
(3.1)  may  lose  their  superiority. 

In  this  chapter,  the  empirical  Bayes  (EB)  estimators  of  (X^)*  are  proposed  by 
some  given  appropriate  g-priors.  The  EB  estimatiors  serve  as  a weighted  average 
of  the  LSE  of  X*^^  k = l,--.,m,  based  on  the  model  (3.1)  and  the  model  (3.3). 
This  IS  of  practical  benefit  especially  when  the  weights  are  so  chosen  such  that  the 


compromise  estimator  of  X^^  leans  more  towards  X^^  for  all  k if  the  observed  F 
ratio  IS  large,  and  to  X*i^j  if  the  observed  F ratio  is  small.  In  Chapter  4,  a choice 
between  the  model  (3.1)  and  another  reduced  multivariate  model  will  be  discussed, 
also,  for  reaching  a proper  empirical  Bayes  (EB)  estimator,  a different  g-prior;  also 
based  on  the  prior  derived  by  Reinsel  (1985),  will  be  discussed. 


A popular  method  of  achieving  a compromise  based  on  the  full  and  the  reduced 
models  IS  the  use  of  the  preliminary  test  estimators  {PTE)  based  on  the  rejection 
or  acceptance  of  the  null  hypothesis  Hq.  It  shown  in  Section  3.2  that  for  every 
PTE  of  X*^^’s,  there  is  a corresponding  modified  EB  estimator  which  dominates 
the  PTE.  This  is  not  a surprising  consequence,  since  the  PTE  also  does  not  take 
into  account  the  degree  of  evidence  for  or  against  the  null  hypothesis  Ho  : = /? 


54 


Section  3.2  introduces  the  EB  estimators  and  their  positive  part  versions.  Sev- 
eral theorems  are  given  which  describe  the  frequentist  as  well  as  the  Bayesian  proper- 
ties of  the  proposed  estimators.  It  is  shown  that  there  are  EB  estimators  of  (^)* 
which  always  dominate  ;{XJJT)T  ^^der  squared  error  loss. 

Section  3.3  introduces  several  estimators  of  These  estimators  are  proposed  such 
that  they  shrink  the  unrestricted  LSE  to  the  restricted  LSE  or  p^  , with 
different  shrinkage  coefficients  for  some  k.  Under  squared  error  loss,  their  frequen- 
tist risks  and  Bayes  risks  are  compared.  Finally,  in  Section  3.4,  in  a multivariate 
regression  example,  the  different  estimators  of  (^)*  are  compared  according  to 
their  simulated  frequentist  and  Bayes  risks. 

3-2  The  Empirical  Bayes  Approcah 

In  order  to  develop  a.n  EB  procedure,  a Bayes  estimator  of  for  k = 

should  first  be  obtained.  Based  on  the  null  hopothesis  Hq  : P^^  = p^ 

and  P^^  = 0 for  all  k,  a g-prior  distribution  of  for  the  general  mulitivariate 
regression  model  in  (3.1)  is 


where  u = \ 
and  Pi  p2 


P- 


ind. 


N,{y.y{xlvi'x,)-% 


(3.5) 


with  the  dimensions  of  y_^  and  0 equal  to  p,  x 1,  p,  x 1,  respectively, 


With  the  above  g-prior,  the  posterior  of  ^ given  Y_^  is 

= (3.6) 

where = {XlVl'X,)-^XlVl.'Y^  is  the  BLUE  of  0^,  and  B = -f  r^).  By 

expressing  ^ Bayes  estimator  of  under  any  quadratic  loss 

(3.4)  is 


Ub(W  = E{X,:^\Y^) 

= ^»i[&  + (I-B)(|„-S;i)]-f(l-fl)2f„^^,,fort=l,...,m.  (3.7) 


55 


In  an  empirical  Bayes  framework,  the  parameters  z/j,  and  are  unknown, 

and  need  to  be  estimated  from  the  marginal  distribution  of  Fi,  • • • ,Y^.  Note  that 
marginally 

^nk{KkikLi,cr^Y_i^  + T^P,^),  (3.8) 

where  . By  using  Exercise  2.9,  p.33  of  Rao  (1973),  and 

recalling  B = j[o^  + r^),  it  can  be  shown  that 

(<r"Kt  + r=£^)-‘ 

= (3.9) 

Now,  let  C,,,  = £«.■„  = e*,t  - £nvfir'C*,r  (i,i  = 1,2  and  k = 

1,  •••,m).  Under  the  model  (3.3),  the  BLUE  of  /3  is  ^ = (y;”L  C FiV”* 

— 1.  >— 1.  1 ^11/  2-^ 

Qjcu§_ki^  with  = {^lY-k^ Kki)~^  Then,  based  on  the  normal  equa- 

tion 

G.ku§_ti  = + ^12^2,  (3.10) 

the  following  identity  can  be  obtained 

~ ^*1  ~ (3-11) 

Since  = Zw  (t  = 1,2),  using  the  idempotency  of  V_l^Px^,  and 

(3.9),  one  gets 

(i:*  - X»i!;i)^(oVj  + 

(V^-‘  - (1  - 


56 


Thus,  adding  all  k components  of  the  above  equation,  and  using  the  identity  (3.11), 
one  gets 


E(n  - + T^ExJ'Hyk  - Kkif'i) 

k-l 

m 

k=l 
m 

k=l 

^ r 

X]  ^2— *22.1^2^'^’ 


^11  £*12 
£*21  £*22 


ikL~ 


^*2 


*=1 


(3.12) 


where 


fc=l 

m 

= (i^i-4.)"'(Ee*n)(iii-&) 

k=l 

m 

+ E(^*i-^,r£*ii(^*,-^j,  (3.13) 

and  SSE  — I2*=i(Z^i:  — [Yj^  — XjJ^,  the  usual  error  sum  of  squares. 

Let  SSH,  = Er=i(^*i  ^ii(^*i  -(^1,).  and  SSH^  = Er=i ^"'2£*22.i^2- 

The  summation  of  SSHi  and  SSH2  is  the  sum  of  squares  due  to  the  hypothesis 
= ^*1  = i.1.  and  = 0 for  all  k.  From  (3.12)  and  (3.13),  it  is  easy  to  see  that 
,SSHi  + SSH2,  SSE)  is  complete  sufficient  for^i,r^,a^. 


Define  C — E*Li  £*n-  Since 


^5^1  = Cll-Jl) 


X ( 


£111  • • ' Q 

£iii£  £111  • • • £iii£  ^£,nii 

.0  •••  £.11. 

. £mll£  £111  • • • £mll£  ^£.11 

^11 


■ i-ml  J 


57 


'll 

i^iii 

••  0 

V{ 

■ Li. 

) 

= (a^  + r^) 

0 

—mil  . 

and 


'Cm  ...  0 

■£niS-‘e.u  ■ 

.0  •••  ^11. 

■ £lu  ■ * ■ £mll£  *£mll 

(-1-1 


X 


0 


(-<-1 


4. 


0 


4. 


^111^  * 


(3.14) 


The  right  hand  side  of  (3.14)  is  an  idempotent  matrix  with  trace  equal  to  (m-  l)pi. 
Thus,  SSH^  ~ (r2  + Since  ^/s  are  indepedent  of  (see  Lemma 

2 in  Ghosh,  Saleh,  Sen,  1987),  and  SSH,  ~ (r^  + thus,  SSH,  + SSH, 

- The  UMVUE  of  a~^B  = [r^  + o^)-^  is  then  given  by  {mp  - 

Pi  - 2)l{SSHi  + SSH2)  for  (mp-pi)  > 3 and  p = p^+p^.  The  best  scale  invariant 
estimate  of  is  SSE/ {N  -mp  + 2),  where  N = ^k,  and  the  UMVUE  of 

IS  . Substitutes  of  all  these  estimators  for  the  unknown  parameters  in  (3.7)  gives 


^Esi^k) 


+ 


—*1(^1.  + (1  ~ 

_ mp-pi- 

^ F 


mp  - Pi  - 
F 

-)^lkv 


A:  = l,---,m. 


(3.15) 


where  F = [N  - mp  + 2)(SSH,  + 5S/f,)/SS£,  a constant  multiple  of  the  F ratio 
for  testing  Ho  : 0^^  = and  = 0 for  all  k. 

In  parctice,  it  is  much  more  appropriate  to  use  the  positive  part  EB  estimator, 

^E*B  = UiEBi  • • ‘ i^t^EB^i  because  B is  estimated  by  the  quantity  (mp  - Pi~2)/F 
which  can  be  greater  than  1. 


58 


^kEB  Q^/t) 

— *1(^1.  + (1  ~ 


- Pi-2 


F 


(3.16) 


where  a"*"  = max(a,  0). 

Note  from  (3.16)  that  for  very  large  F values  signifying  substantial  departure 
from  Ho  : 0^^  = and  = 0 for  all  k,  etsB  is  very  close  to  whereas  for 

very  small  F values  signifying  enough  support  for  Ho,  4eb  is  very  close  to 
for  all  k.  When  there  is  no  clear-cut  decision  for  or  against  Ho,  ^^b  is  in  some 

sense  a weighted  average  of  and  with  the  weights  being  adaptively 

determined  by  the  data. 

Next,  the  estimators  . . . , 

ing  to  the  criteria  of  frequentist  and  Bayes  risks.  First  the  frequentist  criterion  will 
be  utilized. 

The  general  class  of  estimators  of  Xjt/?*  is 


+ (1  - - a,))  + (1  - 


k = 1,  - • • ,m. 


(3.17) 


The  following  theorem  provides  the  risk  expression  for  under  the  loss  (3.4)  with 
^ The  risk  is  denoted  by  R{X,0_^,  • • -,X^0^  -e,,,  • • • , e^^). 


Theorem  3.2.1.  If  is  a differentiable  function,  then,  under  the  loss  (3.4)  with 


59 


- - Pi)  + - ^)F} 

+ -^^i- •••.^„.<^^{  ^2  \^^Hi  + SSH2)}. 

Proof  of  Theorem  3.2.1.  By  using  (3.11),  it  can  be  shown  that 


■k2 


Then, 


+ (=^*2  ^klS.kllQ.ku)[  (1  ~ —j^  ^ — 0 

m 

- x,!j  ) 

A;=l 

m 

= e^2„-.,s,„.{I2„  - - ^(1.,-ijr 

+ (l«  - i..  - ^-^hfChnAK  - 

= mpa^  - 2-Bj,,  E(  (£,.  - & - E(|t,)) 

+ El  (2„  - 4 )’'ai.(3„  - 0, ) 

k=l 

+ ^2^22.1^2  1) 

A generalization  of  Stein’s  identity  (Berger  and  Haff,  1983)  gives 

^2..  El  (in  ~ - Eiit,)) 

+ ^i2^22.l(^2  ”^*2)]} 

6 

Y0 

d§_ 

= o'E.  . ,f>l{lIl.'l’(E)^  SSHi  + SSHi 

f 'sSE/(N-mp~2) 


(3.18) 


(3.19) 


(3.20) 


Trl 

-kl 


-*2 


60 


+ — ^(("i-l)pi  + mp2)} 


(3.21) 


Recall  that  SSH,  - Sr=,(^i  -^j),  SSH^  = ajjA,, 

F = {{SSHi  + SSH2)ISSE)[N -mp^2).  Since  •••,  \ 

■>K.m§_^  = mpa^,  (3.18)  can  be  obtained  by  combining  (3.20)  and  (3.21).  The- 
orem 3.2.1  is  hence  proved. 

The  following  corollary  to  Theorem  3.2.1  provides  sufficient  conditions  under 
which  = (ef^, . . • dominates  e^ss- 


Corollary  3.2.1.  Suppose  ^ = Z* ' and  mp  - > 3;  cf>{F)  is  non-decreasing  in 

F,  and  0 < (f>{F)  < 2(mp  - pi  - 2).  Then, 


R{Ki0,,  ■ • • . -,Xjj. 


Proof  of  Corollary  3.2.1.  Let  h{F)  = <f>{F)/F.  Applying  (2.18)  of  Efron  and 

Morris  (1976)  and  (3.9)  of  Ghosh,  Saleh,  and  Sen  (1987),  the  last  term  of  (3.18) 
can  be  expressed  as 


Ep_^,..,0^,Ah}{F)F{SSE/{N  - mp  + 2))) 


= o'^E^ 


AF) 


F N — mp  -|-  2 
From  (3.22),  and  using  the  property  <i>  {F)  > 0, 


<f>{F)4>'{F)). 


(3.22) 


The  corollary  follows  from  (3.23). 


(3.23) 


Note  that  under  the  lose  ■r-“L(JC.2., • • • fisE  i»  constant 

risk  minimax  estimator  of  {XQ\  Hence,  any  estimator  dominating  is  also 
a minimax  estimator  of  [X^* . Thus,  Corollary  3.2.1  provides  useful  minimax 


61 


estimators  of  [X^* . Also,  it  shows  that  dominates  elg^,  and  is  a mini- 
max estimator  if  for  is  a special  case  of  with  4>{F)  = mp  — 

Pi  — 2.  It  is  clear  from  (3.18)  that  when  (f>{F)  = c,  a positive  constant,  then 
RiKiP^,  • • ' is  minimized  when  c = mp  - p^  - 2.  Thus,  for 

) S-eb  optimal  within  the  class  of  estimators  of  the  form 

+ (1  - - h))  + (1  - * = 1,  ■ ■ • ,m. 

It  will  be  shown  later  that  the  Bayesian  optimality  of  holds  within  the  same 
class  of  estimators  irrespective  of  any  p.d.  weight  matrix  Q^. 

It  is  not  necessarily  true  that  dominate  Y,  ■ ■ ■ )^)^. 

By  expressing 


= - E{1^))  + X,,{E{1^)  - E0^^)) 

+ [XklQjcllQjtn  — Xk2)§_,^2^ 

it  is  easy  to  check  that  for  estimating  {X^*  has  the  risk 


R{Ki§^y,-  • • 1 Km§_^\ Kii^^i  • • • ) 


k=l 


(3.24) 


Also,  from  (3.18)  with  (}>{F)  = mp  - pi  - 2,  one  obtains  that 

RiKl0^,  • • • , ^lEB,  , i.mEB)  - mpa^ 

- 2a\mp  - Pi  - 2fE{h  + (mp  - pi  - 2)^i;(^^^L±^^). 

T Jp'2 


(3.25) 


For  fixed  and  a^,  SSHi,  SSH2,  and  SSE  are  independently  distributed 

with  (SSHi+SSH^)  ~ x^p_p.(A),A  = il/{2a^)){ET=i  E{P,^-l^  Y ^11  ^(^,1- 


62 


^1.)  +Er=i  Il2^k22.i  IJ,  and  SSE  ~ o^Xs-mp-  It  follows  from  (3.25)  that 
R{KlP^,  ■ ■ • , Km^rn’ 

= mpa^  - a^(mp  - pi  - ~ Pi  ~ 2 + 2K)~\  (3.26) 

where  K ~ Foisson(A).  Thus  from  (3.24)  and  (3.26),  as  A 0,  iE(Xi/?^, • • • , 
• • • , pia\  in  which  e.^B,  • • ■,^eb) 

[mp-  {mp  -pi-2){N  - mp)/{N  - mp  + 2)  ] > p^a^.  On  the  other  hand,  as 
A ^ oo,  R{Xil^,  • • • , X^P^;  Xiil^,  • • • , X^Jj  oo,  but  RiK^p^,  ■ ■ -,X^P^] 
^IBB,  • • • ,e,nEB)  ->■  rnpa^.  Hence  neither  c^^lse  nor  e%g  dominates  each  other.  The 
above  phenomenon  is  expected.  Small  A indicates  that  /?,,  is  close  to  /?  , and  /?  is 
close  to  0 for  all  k,  in  which  case  clearly  is  the  desired  estimator.  On  the  other 

hand,  large  A indicates  substantial  rejection  of  Hq  : = p^  and  = 0 for  all  k, 

m which  case  the  model  (3.3)  is  expected  to  perform  poorly.  The  EB  estimator  turns 
out  to  be  fairly  robust.  Note  that  is  not  a minimax  estimator  of  {2CP)*.  Since 

R{Xi^^,  • • • , XfnP^,  Xii^^  , • • • , oo  as  A oo,  and  from  the  robustness 

criterion,  e*^g  is  evidently  the  winner  over  or  e*gggg  when 

However,  the  estimator  is  obtained  by  estimating  B,  the  Bayes  shrinking  factor, 
by  a random  variable  which  can  be  bigger  than  1 with  positive  probability.  As 
mentioned  earlier,  this  deficiency  is  remedied  by  epg.  For  every  estimator  ^ defined 
in  (3.17).  there  is  a corresponding  estimator  given  by 


+ (1- 


L ^ • • • , m. 


(3.27) 


It  can  be  expected  that  e^*  has  smaller  risk  than 


Theorem  3.2.2.  Under  the  loss  (3.4)  with  ^ = U~\ 

R{KiP^,  • • •,Km0je'l^,  • • • , e+^)  < R{XiP^, . . ■,X^P^;ei^,  ■ . (3.28) 


63 


Proof  of  Theorem  3.2.2.  After  some  algebra,  it  c«iii  be  obtained  that 


= t (1  - - ((1  - I 


*=1 


(X*l(^l  ~^l)  +^*2^2) 


F ' F 

*=i 


F '^^[^>1] 


’ —mdi) 


(3.29) 


From  (3.29),  it  is  suffices  to  show  that 


- - Xi.0^) 

k=l 

1(^*1  ~ ^1.)  ~ ^1.)  ~ ^*2^22.1^42  ~ ^2Jtj  V fc, 

SS£  = e,  3^  > 1}  > 0,  (3.30) 

for  all  hiic  > 0,  /i2t  > 0 and  e > 0.  Next,  using 

+ KjciiE{^^)  — E{^^))  + {Xi^iCj^iiCj^i2  — 2Lk2)§_k2^  (3.31) 

and 

Kkii,_  - 

— K.ki[^i  ~ (=^1^*11^12  “ =^*2)^*2’  (3.32) 


Eq.(3.30)  can  be  expressed  as 


a.  ,.■{(£(&.  -I,,) 

+ ^2^*221^2 1 ~ ^1.)  = ^1*. 
^2^*22.i^2  = ^2t,  VA:,  SSE  = e,  3^  > l}. 


F 


(3.33) 


64 


'^T  '-'T 

Since  [0^^  ■ ■■0^j)  {0^^  ■ ■•0^^)  and  55£'  are  mutually  independent,  (3.33)  can  be 
written  as 

m 

~0ki) 

~ ^1.)  = ^1*5  k] 

+ (^2^*22.1^21^*2^221^*2  ^ ^2*)}-  (3.34) 

From  the  proof  of  Theorem  2.2,3,  it  can  be  shown  that 

m 

^ ■^^■<'^'((^*2^22.1^jt2l^*2^22.1^;t2  “ ^2*}  > 0,  (3.35) 

V h^k  > 0. 


Recall  C = Er=,  &„ 


C_iii  ■ ■ • 0 

f2niQ.  ^Qin  •••  CLiiiC 

. a •■•  . 

,G».u£''£m  ■■■ 

“d  SSH,  = Er=.  (h,  - If  &„(!,.  - 4 ) 


= 9 


—11  —ml 


^ T 

i.r.M 


L^.i 


Since 


I 

i^lll 

0 

. 

) = 

. 

• 

• 

: 

ILJ 

0 

y^-1 

^mll  . 

is  p.d.,  there  exists  a nonsingular  Z_  such  that  V_  = Z_Z0 . Now, 


SSHi  = ^Z’^AZS 


(3.36) 


with  Z_^AZ_  being  idempotent  and  rank(Z'^AZ)  = (m  - l)pi,  and 


1 

' L ■ 

1 

;■ 

U.1  J 

■ L^. 

65 


There  exists  an  orthogonal  matrix  P such  that 


P^{Z^AZ)P  = 


i-(m-l)pi  Q(m-l)pixpi 

-PlX(m-l)pi  QpiXpi 


From  (3.36), 


note  that 


SSHi  = {P4VP{Z^AZ)P^{PI), 


a = Pl-^  Nmp,{PZ-^E 
combining  (3.37)  (3.38)  and  (3.39),  one  gets 


> I-mpi)  J 


SSHi  = 


= ai^a, 


(3.37) 


(3.38) 


(3.39) 


(3.40) 


where  a,-  is  the  ith  element  of  a,  and  = (ai  • • • Using  (3.39)  and 

(3.40),  and  applying  the  argument  in  Theorem  6.2  of  Lehmann  (1983),  yields 


*=i 

^ > 0-  (3.41) 


Theorem  3.2.2  is  hence  proved  as  a consequence  from  (3.35)  and  (3.41). 

Finally,  based  on  the  hypothesis  Hq  : and  = Q all  k,  a general 

PTE  is  given  by 


if"'®  = - £i ))  + Xng(F)'i,,,  Vk,  (3.42) 

where  g{F)  = I[F>d]  with  d a postive  constant  depending  on  the  chosen  level  of 
significance,  and  / the  usual  indicator  function.  The  corresponding  modified  EB 
estimator. 


^kMEB 


(n) 


— 2Lki{^i  + (1  — 


MF) 


MF)0,,-l)) 


+ X.Jl  - k = I,--- ,m, 


F 


(3.43) 


66 


dominates  _ ((ef^^)^,  • • • , under  certain  conditions.  This  is  shown 

in  the  following  theorem. 


Theorem  3.2.3.  Consider  the  loss  (3.4)  with  If  0 < </>o(F)  < 2(mp  - 

Pi  — 2),  then 


< X(X,i,, ■ ■ ■ 


(3.44) 

Proof  of  Theorem  3.2.3.  Using  (3.18),  and  following  the  procedure  in  Proof  of 
Theorem  2.2.4,  (3.44)  can  be  obtained  immediately. 


Note  that  estimates  by  Xjbi^^  if  g(F)  = 0,  i.e.,  the  null  hypothesis 

IS  accepted  at  a desired  level  of  significance,  and  it  estimates  by  an  type 

estimator  as  described  earlier  in  this  section  when  the  other  is  true.  Also,  the  modi- 
fied FB  estimator  , (^^t^^^)^)^  can  be  further  improved  by 

a positive  part  modified  estimator  (^Sbb)^*  = (UiLsf,  • • • , (^IVbb)"’)"'  as 


+ 


^«(A  + (i-^)"s(f)(|„-2J) 

(1  - ^4^)  +j(F)|,,,  i = 1, . . . , m, 


(3.45) 


Now,  the  e*Es  and  other  associated  estimators  of  (^)*  are  evaluated  by  the 
Bayes  risk.  In  view  of  Theorem  3.2.1,  the  Bayes  risk  of  is  smaller  than  that  of 
when  Qj,  = V~^  without  regarding  of  any  prior,  while  e+*p  always  dominates 
e*Es  irrespective  of  any  prior.  However,  the  Bayes  risk  optimatity  of  over  else 
is  not  limited  to  The  following  theorem  will  show  this  superiority  of 

Bayes  risk  of  over  el^E  under  the  prior  presented  in  the  beginning  of  this  section. 


Theorem  3.2.4.  Consider  the  model  ~ iV(X*^^,  a^V„),  for  k = 1,  ■■■,  m, 

where  V^’s  are  p.d.  and  ~ for  all  k,  and  = (^f  0^). 


67 


Let  I denote  this  given  prior  and  let  r(^;  e^,  • ■ • , e^)  denote  the  Bayes  risk  of  an 

estimator  (fi,  • • • , e„)  of  • • • , Km§_^)  under  the  prior  ^ when  the  loss  is  given 

in  (3.4).  Then, 

■ ■ ' i^mEB)  (3.46) 

Proof  of  Theorem  3.2.4,  Let 

‘u  = - j(i,,  - tj)  + (1  - t = 1,  ■ ■ • , m.  (3.47) 

According  to  the  loss  (3.4), 

^(^)  §-lct  ■ ■ ■ 5 §.mc) 
m 

(iu  - In  ~ j(ku  - i,)f}XLQ,Kn  1 

m 

il.2  - - yinfyXl:Q,Xn  ] 

+ prmln-ln-jL) 

~ ^fcl  ~ ^l))^}X^l0jcXlc2  I 

m 

+ (3.48) 

Since 

Eihn-in  - -&.))(!„ -In  - 

= ’’’Sin  t + -®(|^5 (^1  - 2i.)(2,i  - iif) 

- (3.49) 

and 

2 c(N  — mp)  1 - - 

^ N-mp  + 2 ^SSHi  + SSHi  ~ ~ ^i.)^)-  (3-50) 


68 


By  using  (3.6),  (3.50)  can  be  expressed 

, ^ - ">p)  E, 

N-mp  + 2 


as 


^SSHi  + SSH2 


(3.51) 


Using  (3.11),  independence  between  ^^^’s  and  independence  between  and 

SSHu  independence  between  and  , and  for  all  k, 

applying  Basu’s  theorem,  and  Lemmal  in  Ghosh,  Saleh,  and  Sen  (1987),  Equation 
(3.51)  equals  to 


^,^  c{N-mp)  U(g^, 

N-mp  + 2^  E{SSHi  + SSH^) 

^ C-.nCj.uV0jC^2^Cj;,\ 

E{SSHi  + SSH2)  ^ 

2 „ c(N  — mp)  1 

(iV  - mp  + 2)  (mp  - pi)  ^ J ’ 

where,  C = Again,  according  to  the  same  argument  in  (3.52), 


(3.52) 


= [^n.2  - g ^) 

(iV  - mp  + 2)  (mp  - pi)  (mp  - pi  - 2) ' 

And  combining  (3.49),  (3.52)  and  (3.53),  the  first  term  of  (3.48)  can  be  expressed 
by 


2 

Y.Wtr[SZl^(Xl,Q^X^,]  + ( f 

k=i  mp  — Pi  — 2 


2c)o^B 


N — mp 

[N  -mp  + 2)(mp  - pi) 


(3.54) 


Now,  applying  Basu’s  theorem  and  Lemma  1 in  Ghosh,  Saleh,  and  Sen  (1987), 


- 


mp-  Pi -2 


- 2c)a^B 


N — mp 

(iV-mp  + 2)(mp-pi)-*”n-  (3-55) 


69 


From  (3.55),  the  last  term  of  (3.48)  becomes 


m 2 

^{cr  tr[C.k22.l{Kk2Qi^Kk2)  ] + — 2c)a^B- 

*=i  ~ Pi  ~ 2 (iV  — mp  + 2)  (mp  — pi) 

[ ^k22.1^k2^k—>‘^  (3.56) 


if  [ ^k22.1^k2^k—>^^ 

Finally,  by  using  the  same  argument,  the  cross-product  term  can  be  expressed  as 

^(L  - iki  - jilki  - IMK  - ik2  - 
= -(^H^l\.2^uQl2\)  - E{^(tkl  - h)(§-k2  - 

- EijC§_ki  - mki\^))fk2) + E{^2ilki - k&) 

= -^\Q-A.2Qj,12Qrk^2)  - s 

N-mp-2  ^ SSH1  + SSH2  i 


N-mp  + 2 ''  SSHi  + SSH2  ’ 
— (^11.2^12^22) 

C2 


- (- 


2c)a^5- 


N — mp 


mp  — Pi  - 2 {N  - mp  + 2)  [mp  — Pi) 

From  (3.57),  the  cross-product  term  in  (3.48)  can  be  expressed  as 


^ii^12£*2Vi-  (3.57) 


m 

^{-(7  if  \G.kll.2Gjol2C.k22^2Q k^l  1 

^mp  - pi  - 2 ^{N-mp  + 2}(mp  - p,)  I | 

- I £«.,c*jia‘.xLa,Xt2 1 - ( — — - - 2cyB, 

mp-  Pi -2  [N  -mpE  2)  [mp  - pi) 

i^  [ S.k22^2iQ^n.2KliQj^2Ck2  ]}•  (3.58) 

By  combining  (3.54),  (3.55)  and  (3.58),  it  can  be  obtained  that 


^(^>-lc’  ■ ■ ' 5^mc) 
m 

i=l 

+ ( i.-'icVB-, UAC.  r-'  I 

^P  Pi  2 [N  — mp 2)[mp  — piy  — * — *22.1  J 

+ i'-[{GZii-Qr^)2aiQkKki]}, 


(3.59) 


70 


where  ^ Kk2)^Q^  -Xjt2)-  From  (3.59),  it  can  be 

concluded  that  r(^;e,„- ■ ■ ,e^)  < r(^;  ■ ■ ■ ,X^^J,  when  0 < c < 2(mp - 

7*1  ~ Since  = ^./ceb  if  c = mp  ~ Pi  — 2,  Theorem  3.2.4  is  proved. 

The  next  theorem  gives  necessary  and  sufficient  conditions  under  which  the 
Bayes  risk  of  is  smaller  than  that  of 


Theorem  3.2.5.  Consider  the  set  of  Theorem  3.2.4.  Then, 

,x^jj,  (3.60) 

if  and  only  if  (l  - B^/B^  > (1  - (mp  -p^-2)(N-  mp)/((mp  - p^)  (N-mp  + 2))) 
with  B = !(o'^  + r^). 

Proof  of  Theorem  3.2.5.  By  expressing 


+ {XklQ.kllQjcl2  ~ Xk2)§_j^2' 
it  can  be  checked  that 


m 


+ tr  I - tf(Kl,Q^Xj„)  ] 

+ 21r  I - !,.)"■  (2fT, a - it,))  I 


where  (inSrACtj,  it,)^Q.  (i*i£*n£*i2  - i*j)-  Since 


(3.61) 


(3.62) 


71 


= (C  ^Ctii  - /)(^^  + Q-khQjcn§_^2y^ 

= 2-  (3.63) 

By  using  (3.61), (3.62)  and  (3.63),  one  gets 

m 

t=l 

+ <’■  ( - i,M^uSj.X„i)  I 

+ iae;i.|}.  (3.64) 

Now,  - 1^,))  = r^(e;A  -£-■).  By  using  ir|(xrz;‘X»)-' 

f— *)1  “ iX.^\Q uKki)  1 +^'" [£k£*22.i  1>  (3.59)  with  c = mp  — pi  —2 

and  (3.64), 


= (-.2  _ ^2  I - ”^P)  (^P  - Pi  - 4 

{N  — mp  + 2)  (mp  — pi) 

+ (r|(e;A-e-')2ff,2,x,.i}). 


*=i 


(3.65) 


From  (3.65),  the  conclusion  of  the  theorem  follows. 

The  consequence  of  the  above  theorem  is  clear.  If  < r^,  then  5 < 1/2  in 
which  case  (1  - B^/B^  > 1.  Thus,  e^g  always  dominates  e*ggg^  in  its  Bayes  risk 
if  the  sampling  variability  is  smaller  than  the  prior  variability.  However,  if  is 
much  smaller  than  a^,  the  model  (3.3)  seems  to  be  appropriate,  and  clearly  eg^gg 
becomes  a winner. 


A general  class  of  minimax  estimates  of  (^)*,  which  was  described  earlier  in 
this  section,  can  be  expressed  as 


^ti(^i  +(l  -^1 )) +^*2(1  - |;)^^.2,  A:  = l,...,m,  (3.66) 


72 


where  c is  a proper  constant.  The  following  theorem  shows  that  is  the  optimal 
estimate  of  {XJ3)*  within  the  class  of  all  estimators  of  the  form  ^ = fef  , • • • , 

— \ — 1C  “ 5 — TftC/  * 


Theorem  3.2.6.  Consider  the  same  set  up  as  of  Theorem  3.2.4.  Then, 


^Icj  ■ ■ ■ ) 5 ‘ ‘ » ^ms) 


k = l 


+ ( 


mp  — Pi  —2 


- 2c)a^B 


N — mp 


{N  - mp  + 2){mp  - pi) 


!;{<<■  l£be«Vi  I + <r  I (C;,‘i  - £r')x^iQ„Ki„ )}.  (3.67) 


k=l 


For  mp  — Pi  > 3,  the  above  risk  is  minimized  at  c = mp  — pi  — 2. 

Proof  of  Theorem  3.2.6.  From  the  proof  of  Theorem  2.2.8,  it  can  be  shown  that 


^ ^ 5 ’ ■ ■ ) §mB  ) 
m 

= B)J2trHXlyS'Ki)-'^,Q,X,].  (3.68) 

* = 1 

Then,  Eq.  (1.66)  can  be  proved  by  combining  (1.58)  and  (1.67). 

3.3  Empirical  Bayes  Subset  Estimators 

In  this  section  four  empirical  Bayes  subset  estimators  of  are  discussed  and 
compared,  under  the  loss 


= i^-§_ki)'^Uk{ak-U.  (3.69) 


The  corresponding  shrinkage  coefficient  is  derived  from  the  usual  F ratio  for 
testing  Ho  : = 0.  If  is  defined  as  an  estimator  of  for  some  k that  shrinks 

the  unrestricted  least  squares  estimator  to  the  restricted  least  squares  estimator 
the  hypothesis  Hq  : = 0,  then  can  be  expressed  as 


^*1 


= f3,  - - 3 ) 


where  = [N  - mp  + 2)l^^Cj,^^  il^jSSE. 


(3.70) 


73 


Now,  is  defined  as  an  estimator  that  shrinks  the  unrestricted  least  squares 

estimator  to  the  common  restricted  least  squares  estimator  under  the  hy- 
pothesis Hq  : and  = 0 for  all  k.  If  the  shrinkage  coefficient,  (f>{Fk)/Fk, 

is  the  same  one  described  in  ejj,  then 

4*1  =hi~  - L)-  (3.71) 

Replacing  the  shrinkage  coefficient  <i>{Fk)/Fk  by  4>{F)/F  of 

- Iki  - ~ Iki)^  (3.72) 

where  F = {N  - mp  + 2)(55Fi  + SSH2)/SSE  has  been  mentioned  in  Section  3.2. 

The  last  one  is  a subset  of  described  in  Section  3.2.  Thus,  can  be 
expressed  as 

(3.73) 

The  comparison  among  those  estimators  are  based  on  their  frequentist  risks  and 
Bayes  risks.  Now,  these  estimators  of  are  first  evaluated  for  some  individual  k 
according  to  the  Bayes  criterion.  Let  <f>{F)  = <^(F*)  = c,  a positive  constant,  and 
rename  as  for  t = 1,2,  3,4.  The  following  theorem  shows  that  under  the 

prior  described  in  Section  3.2,  the  Bayes  risk  of  with  c = mp  - pi  - 2,  i.e.,  the 

subset  of  ^£3,  IS  the  smallest  one  among  the  four  types  estimators  of  irrespec- 
tive of  any  p.d.  U^. 

Theorem  3.3.1.  Consider  the  prior  ^ in  Theorem  3.2.4.  Let  denote  the 

Bayes  risk  of  an  estimator  e^i  of  for  some  individual  k under  the  prior  ^ when 
the  loss  is  given  in  (3.69).  Then, 

™n{r(£,^^^i)} 

“”{'■(?, 5!M)},min{r({,e,'„)}}, 


74 


Proof  of  Theorem  3.3.1 . By  expressing  as 

^*1  ^ki  ~ ^(§_ki  ~ ^ki) ' (3-74) 

one  gets 


- §_kyu.k(i^^  ~^ki)^ 

+ (3.75) 

Recall  B o j(p  + r ),  From  (3.6),  (3.11)  and  the  mutual  independence  among 
^I’s,  and  SSE, 


c 


_ _2 


JV  — mn  4-9  ' -T  _ ~ !• 


TV  — mp  + 2 

Now,  by  applying  Basu’s  theorem,  Eq.  (3.76)  equals 


a^cB- 


N — mp  C 
N — mp  + 2 


^2— *22.1^2 


1-1 


(3.76) 


^11.2  ~ ^kll 

P2 


(3.77) 


Similarly, 


= o c 


E( 


_ N-mp  c„,c;A 


N — mp  + 2 


— ^2^2 


a^c^B- 


(tk2S^22.lt^^Y 

N-mp  _^A.2-C:*i\ 


) 


TV-mp  + 2 (P2)(P2  -2)  ■ 


(3.78) 


75 


By  using  (3.77)  and  (3.78),  it  can  be  then  shown  that 


+ a^B 


(iV  - mp  + 2)p2  P2  - 2 

[ (^tll.2  ~ G-kn)lLk  ]) 


(3.79) 


which  is  minimized  at  c — p2  — 2.  The  minimum  of 


IS 


_ g{N  - mp)(p2  - 2) 


'’■((£w..!-ei'A)2 


(N  -mp  + 2)p,  I-  (3-80) 

By  using  the  same  arguments  described  above,  the  mutual  independence  of 
la  “d  lu  - &.>  and  the  fact  that  £(^_  - )’'='^(|j,  - ) 

= {a^  + — C“*),  one  gets 

+ a^B-  (-^-2c) 

{N  -mp  + 2)p2  P2  - 2 ’ 

tr[{C-;;L2-C-^]U,],  (3.81) 

which  IS  minimized  at  c = p2  - 2.  Thus,  the  minimum  of  can  be  expressed 

as 


’•(5.4-!)»,)  = ( Cj-iVtCa 


- "■?)(?!  -2) 
(iV-  mp  + 2)p2 
X i^[(^Hi.2-Cr')C^]. 


Recall  that  ^X*i  and  C = E*Li^n. 

The  Bayes  risk  of  ^ki  also  can  be  expressed  as 

N — mp 


(3.82) 


_j_  Q ■* ' '"-f , c 

{N  — mp  + 2)  (mp  — px)  mp  — px  — 2 
[ {Okh.2  ~ G^ii)lLk  ]> 


(3.83) 


76 


which  IS  minimized  at  c - mp  - pj  - 2.  The  minimum  of  can  be  shown  as 

'^{^i§^rnp-pi-2)kl)  ~ [G-kll.2lLk\ 

^2p(^'-mp)(mp-pi  -2) 

[N  -mp  + 2)(mp  - pi) 

X ^^\{Q-i:n.2-CA)lh]-  (3.84) 

By  using  (3.54), 

^ ^2^ N -mp 

{N  — mp  + 2)  (mp  — p^)  mp  — Pi  — 2 

ir  [ (^n.2  - Q.  ^)Uk  ],  (3.85) 

which  is  minimized  at  c = mp  - pj  - 1.  The  minimum  of  is  given  by 

rU,^mp-p,-2)ki)  = 

^2p(iV-  mp)(mp-pi  - 2) 

{N  — mp  + 2)  (mp  — p^) 

X (3.86) 

Note  that  and  \ = 

S^tii^i2^i:2Vi^2i^i\  +QJin  -QT^-  Comparing  (3.80), (3.82), (3.84)  and  (3.86), 
Theorem  3.3.1  follows. 

The  condition  described  in  Theorem  3.3.1  is  clearly  expected.  It  is  only  for 
that  the  estimators  of  as  well  as  l/{a^  + t^)  use  all  the  available  data,  i.e., 
Zi,  • • ■,Ym-  The  other  three  estimators  use  only  F*  but  not  all  the  Zi,  • • • , for 
estimating  either  or  l/(a^  + r^)  or  both. 

If  c is  assumed  to  satisfy  the  conditions  of  that  0 < c < 2(p2  — 2)  in  and 

-c*i>  ^iid  0 < c < 2(mp  Pi  — 2)  in  and  it  is  then  ecisy  to  demonstrate  that 
all  these  four  estimators  of  /?  are  better  than  3 

— *1  —kl 

Suppose  one  expresses 

Iki  ~ ijci  = iiki  ~ 

+ ^hOjc\2§_i.2i 


and  gets 


77 


= (a'lE*)  + rhr  [ (czly,  - e;,‘.)E*). 


(3.87) 


Therefore, 


(iV-mp  + 2)p2 

> 0 


if  and  only  if  (l  — B)^  jB^  > 1 — ^P)  (P2  2) 

(iV  - mp  + 2)p2  ’ 


(3.88) 


and 


(N  — mp  + 2)p2  — — ' — * J 


+ 

> 0, 


(r^ -o‘  + -CMr/l 

(iV-mp  + 2)p2  I ^-*11-2 


if  and  only  if  (1  - BV  /B"^  > 1 - '^P)  (P2  2) 

^ (iV-mp  + 2)p2- 


Also  based  on  the  same  procedure  given  above, 


= (r^  - ('"P  - P-  - ^) fc-  _c-.u/| 

(TV  - mp  + 2)  (mp  - pi)  ^ ^*ii)  ] 


> 0, 


(3.89) 


if  and  only  if  (1  — BY / B^  > 1 — mp)[mp  pi  - 2) 

(TV  — mp  + 2)  (mp  — pi)  ’ 


(3.90) 


and 


,2  p (^  - mp)(mp  - Pi  - 21 
(TV  — mp  + 2)  (mp  — p^) 


78 


{N-mp  + 2){mp  - p{)>  ‘ ^kn)lLk  ] 


> 0, 


if  and  only  \i{l-BYlB^>  ^ [N  - mp){mp  - p^  - 2) 

{N  -mp  + 2)  (mp  - pj  ‘ 


(3.91) 


The  results  of  (3.88),  (3.89),  (3.90),  and  (3.91)  are  not  surprising  since  is 
expected  to  perforin  better  than  (f=l,2,3,4)  when  the  prior  variance  is  much 
smaller  than  the  sampling  variance  On  the  other  way,  the  sampling  variance 
ct2  is  smaller  than  the  prior  variance  i.e.,  there  is  no  strong  evidence  to  show 
§_k2  ~ th^se  four  ^^j’s  can  have  smaller  Byes  risk  than  0 

— ' k 1 

Next,  the  frequentist  criterion  is  utilized  to  evaluate  the  estimators. 

Recall  (3.6)  and  the  mutual  independence  among  and  SSE  under 

the  the  distributions  of  A:  = 1,  • • . , m.  The  frequentist  risk  of  4,,  under 

the  loss  in  (3.69)  is 


+ 


■^^1’ p2 


)• 


(3.92) 


As  mentioned  earlier,  by  using  a generalization  of  Stein’s  identity  (Berger  and  Half, 
1983), 

= (^^G-k22\Eg  ...a  c r-M 


'•  r.  p2 


) 


^ —k\i^i2^2^k2^ii^kn/{^SEl[N  — mp  + 2)) 


^ ^i^k)  !q-\  ^-1  V 


(3.93) 


Combining  (3.92)  and  (3.93)  yields 


79 


iii»  ^ ^ P,  p2 

k 


) 


(3.94) 


f i^n.2  ~ G-kh)Hjc  ]) 

The  frequentist  risk  of  can  be  then  obtained  as 

- - Eiij)  -Qzhc,u{h,-pj] 

MF),]  ^ ^ 

^ ~ ^1.  ~ ^kllQja2^^^^}JLk  ] 

+ ^2,.  ■-.s.-'{^((l„  - &/£*(!„  - 1,) 

(3.95) 

By  using  a generalization  of  Stein’s  identity  (Berger  and  Half,  1983),  the  cross- 
product  term  of  (3.95)  can  be  expressed  as 

2ahr\Ep^...p  C . 3 fc  r~^ 


- ^n^uhc2iKx-hf)/{SSE/{N  -mp  + 2)) 

Hf'k) 


+ ^(aVt -£-■)}£* 


(3.96) 


From  (3.95)  and  (3.96),  one  gets 


^(§.kV^kl) 


^''HQ.kh.2Uk)-2c^E0  ...p  ,2{2(^i^-^t^ 

4i’  ^ ^ Fk 


n 


^ i^k2^2lC-klxUkCk^xCkx2lk,  - (^,  - tO'^UkOj,l,Ck,,lj 


80 


-r  [SSE /[N  - mp -\-2)) 

+ - hfUhi  - &.) 

~ ^l)  ^k2^i^^kl\lLkQ.kllQja2§_k^} • (3.97) 
By  iising  the  same  argument,  it  can  be  shown  that 


^ (li!£*ji£ii‘i£i£Hi£ii2|j,  - (It;  - i^flhQZliQj.uffj,^) 
-T  (SSEI(N  — mp-{-2)) 


(3.98) 


and 


+ 

+ 

+ 


- Ij  - 2(|t;  - I, 

^2^2iQjciiJLkQ^iiQj^i2^^  / (S  S E / (N  -mp  + 2)) 


2(^*1  ^1.)  ^^*^11^12^2  +^*2^21  (3.99) 

Investigations  of  Eqs.(3.94),  (3.97),  (3.98)  and  (3.99)  indicate  that  there  is  no 
winner  among  these  four  estimators  as  demonstrated  by  the  comparisons  of  their 
frequentist  risk  with  the  loss  defined  as  (3.69).  This  remains  true  even  though 
4>(F)  = 4>(Fk)  is  equal  to  a positive  constant  c. 


81 


3.4  Numerical  Examples 

To  illustrate  the  results  for  the  estimation  derived  in  the  previous  sections,  a 
Monte-Carlo  experiment  was  conducted  to  assess  the  frequentist  risks  as  well  as  the 
Bayes  risks  of  the  different  estimators. 

Random  samples  of  m = 3 normally  distributed  random  vectors  F*  {k  = 1,2,3) 
of  dimensions  ni  = 16,  n.2  = 14,  ns  = 17  are  taken  in  a 2*  factorial  experiment.  Here 
Fi  is  the  vector  responses  of  the  1st  individual,  which  is  based  on  a 2^  factorial 
experiment  replicated  two  times  (see  Section  2.4)  and  F,’s  {i  = 2,3)  are  the  ith  in- 
dividual’s responses  vector  which  are  based  on  2^  factorial  experiment  with  unequal 
replications  for  certain  treatment  combinations.  Let  Yij,  denote  the  response  of  the 

A:th  individual  receiving  the  ith  treatment  within  the  jth  replicate.  It  is  assumed 
that 

E{Yijk)  = 7o*  + 7ufit  + 72*^21  + 73tC3.- 

+ 7l2fcfl.f2i  + hsk^li^Si  + 723*f2tf3t  + ll23k^liC2i^3i, 

where  z _ l,...,8.  A;  = 1,2,3.  Note  that  in  most  cases,  there  are  two  responses 
corresponding  to  the  A:th  individual  receiving  the  tth  treatment.  The  only  exceptions 
are  (t.  A:)  — (1,2),  (6,2)  and  (2,3).  in  the  first  two  cases,  there  is  one  reponse.  For 
the  third  case,  there  are  three  responses.  Here  f,,  = 1 for  t’  = 1,2, 3, 4,  corresponds 
to  the  high  level  of  fi,  while  fi,  = -1  for  f = 5, 6,  7, 8,  corresponds  to  the  low  level  of 
fi.  Also  ^2f  - 1 (C3i  = 1)  for  z = 1, 2, 5, 6 (i  = 1, 3, 5,  7),  corresponds  to  the  high  level 
f ^2  (^s))  and  f2i  — 1 (^3,-  — —1)  for  z = 3, 4,  7, 8 (z  = 2, 4, 6, 8),  corresponds  to  the 

low  level  of  ,2  (fs).  For  = (70*, 7u, 72*, 73.)  and  = (712*, 7x3*, 723*. 7x23*), 
mpx  - mp2  - 12  and  N = 16  + 14  + 17  = 47.  The  design  matrices  2C*’s  can  be 
partitioned  as  X*  = k = 1,2,3,  where 

^^11111111111111 


yc  _ 
All  — 


1 1 

11  1 1 1 1 1 1 -1  -1  -1  -1  -1  -1  _i  _i 

1 1 _1  1 -1  -1-1-11111-1  -1  -1  -1 

L 1 1 1-1  1 1 -1  _i  1 1 _i  1 1 _i  _i 


82 


^12  — 


■ 1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

-1 

-1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

1 

. 1 

1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

-1 

1 

-1 


-1 

1 

-1 

1 


-1  1 

1 -1 

-1  -1 

1 1 


1 

-1 

-1 

1 


1 

1 

1 

-1 


1 

1 

1 

-1 


^21  — 


Y^  - 
^22  — 


1 

1 

1 

1 

1 

1 

1 

1 


1 

1 

1 

■1 

1 

-1 

-1 

-1 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

1 

-1 

-1 

-1 

-1 

1 

1 

1 

-1 

-1 

1 

1 

1 

-1 

-1 

1 

1 

-1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

-1 

-1 

1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1 

1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1 

1 

1 

1 

-1 


1 

-1 

-1 

-1 

1 

1 

1 

-1 


and 


Y^ 


^2 


1 1 
1 1 
1 1 
1 1 

1 1 
1 1 
1 1 
1 1 


1 

1 

1 

-1 

1 

-1 

-1 

-1 


1 1 

1 1 

1 1 

-1  -1 

1 1 

-1  -1 

-1  -1 

-1  -1 


1 

1 

-1 

1 

-1 

1 

-1 

-1 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

1 

1 

1 

-1 


1 

-1 

-1 

-1 

1 

1 

1 

-1 


It  is  assumed  that  VIZ,)  = F{K,)  = V{Y^)  = and  g,  = V{Yj,), 

for  k = 1,2,3. 

First  the  simulated  frequentist  risks  are  evaluated  based  on  the  loss  as  defined  in 
(3.4).  For  a given  set  of  0^  (k  = 1,2,3),  random  vectors  Fj’s  were  generated  from 
Lk)  in  each  replication.  The  sampling  procedure  was  repeated  1,000 
times.  The  estimated  frequentist  risks  of  estimating  X^§^[k  = 1,2,3),  i.e.,  the 
average  squared  error  losses  of  the  least  squares  estimators  {LSE),  the  model  in 
(3.1),  RLSEl,  the  model  in  (3.3),  RLSE2,  the  model  in  (3.2),  the  empirical  Bayes 
estimator  (c^^),  the  positive  part  empirical  Bayes  estimator  (e+^),  the  preliminary 
test  estimators  a being  the  level  of  significance,  =.10,  .05,  .01),  the  modified 

empirical  Bayes  estimators  (e^^^^),  and  the  postive  part  modified  empirical  Bayes 
estimators  {e'^EBa)^  are  compared.  The  result  are  shown  in  Table  3.1. 


83 


Table  3.1:  The  estimated  frequentist  risk  results  for  2^  factorial  experiment. 


The 

estimator 

ofX,j5,’s 

A. 

(24630000) 

(24630000) 

(24630000) 

(24637129) 

A. 

(24630000) 

(25520000) 

(24636  -125) 

(24636  -1  25) 

A 

(24630000) 

(36520000) 

(24630000) 

(24631111) 

^LSE 

23.9434 

23.9434 

23.9434 

23.9434 

^RLSEl 

4.18885 

69.3494 

885.765 

3112.11 

^RLSEi 

12.1528 

12.1528 

854.819 

3082.02 

^EB 

7.54115 

20.4126 

23.6275 

23.8590 

p*r* 

^EB 

5.28671 

20.4126 

23.6275 

23.8590 

-PTE.IQ 

6.59654 

24.3170 

23.9434 

23.9434 

^MEB. 10 

5.09850 

20.7060 

23.6275 

23.8590 

^AfEB.lO 

5.09850 

20.7060 

23.6275 

23.8950 

^PTE.OS 

5.56069 

24.9046 

23.9434 

23.9434 

^MEB.05 

4.78082 

21.2411 

23.6275 

23.8590 

-MEB. 05 

4.78082 

21.2411 

23.6275 

23.8590 

^PTE.Ol 

4.71660 

28.9008 

23.9434 

23.9434 

^MEB. 01 

4.33754 

25.1467 

23.6275 

23.8590 

^AfEB.Ol 

* H' 

4.33754 

25.1467 

23.6275 

23.8590 

o '-vyiuumo,  uurmg  tne  i,uuu  replications, 
the  null  hypothesis,  and  = 0 (ft  = 1,2,3),  was  rejected  in 

every  replication  for  all  a =.10,  .05  and  .01.  Hence, 

^kMEBa  = e^EB  and  el^MEBa  = 4eb- 


84 


When  and  _ 0 (fc  — 1,  2, 3),  — {{Xn^^)'^,  (Xji^j  )^, 

(Zsi^i  )^)^  is  the  best  estimator  of  {X^)*  = (Xs^)^)^.  How- 

ever, in  case  = 0 (A:  = 1,2,3),  then  ek5£2=((Xii^iJ^,  (Xji^^i)"'’ (^i^i)"')^ 
is  the  best  as  an  estimator  of  {X§)*-  Substantial  improvement  in  the  estimated 
frequentist  risk  of  e^g  over  elsE  = {{Kihf  ,{K2P^f  , the  LSE  of  the 

model  in  (3.1),  was  observed  in  the  cases  of  = 2 (^  = 15  2,3).  When  /?^^’s  are 
all  identical,  the  estimated  frequentist  risk  of  e^g  is  found  only  about  one  forth  of 
magnitude  as  that  of  CgsE-  proved  in  the  earlier  section,  the  estimators 

^MEB  performed  better  than  e_p^g.  Finally,  if  ^ 0 for  some  k,  e*gg  or  e^*g 
has  the  smallest  frequentist  risks.  Admittedly,  the  risk  improvement  is  not  signifi- 
cant over  e*gsE-  All  other  estimators  are  superior  over  e*ggsEi  ^*rlse2  with  the 
considerably  less  estimated  frequentist  risks. 

Table  3.2  shows  the  estimated  Bayes  risks  of  all  the  estimators.  First,  [k  = 
1,2,3)  were  generated  seperately  from  N{u,T^{Xl2Cjc)~^),  where  u and  r*  are  the 
given-values.  For  a generated  set  of  ^^’s  and  a given  a^  random  vectors  {k  = 
1,2,3)  were  then  genterated  seperately  from  X(Xjfe^^,cr^/„t).  Recall  that  ni  = 
16,  n.2  = 14,  ns  = 17.  Once  again,  the  sampling  procedure  was  repeated  1,000  times. 

When  < cr*,  i.e.,  when  the  prior  variability  is  smaller  than  the  sampling 
variability,  the  estimated  risks  of  the  different  estimators  maintain  in  almost  the 
same  order  as  the  corresponding  estimated  frequentist  risks.  The  case  //j  = Q 
7^  0)  corresponds  to  the  case  0^^  = 0^^  = 0^^  and  0^^  = 0 for  all  k (for  at  least 
one  of  0^^  ^ 0).  The  only  difference  is  that  e^*g  has  the  smallest  estimated  Bayes 
risk  when  = Q.  However,  the  relation  between  the  Bayes  risks  of  and  e*gggg^, 
as  dependent  upon  the  relation  of  and  cr^,  when  IL2-Q.  agrees  well  with  Theorem 
3.2.5.  As  r > <r^,  even  though  1^2  — Q5  ^rlsei  was  seen  to  perform  poorly  as 
compared  with  others.  The  increased  prior  variability  diminishes  the  importance  of 
the  prior  mean.  As  then  the  e*gg  and  dominate  both  the  LSE^s  of  the 


85 


Table  3.2:  The  estimated  Bayes  risk  results  for  2^  factorial  experiment. 


The 

estimator 
of  ^^^’s 

1/ 

(3  5 1 6 0 0 0 0) 

(3  5 1 6 4 0 2 5) 

4 

4 

1 

4 

9 

1 

4 

9 

^LSE 

95.2410 

95.2410 

95.2410 

95.2410 

95.2410 

95.2410 

^RLSEl 

36.1561 

95.9029 

195.481 

2069.51 

2129.74 

2229.81 

^RLSEi 

60.2533 

95.8641 

155.215 

2064.76 

2100.34 

2159.67 

^EB 

43.1769 

62.7633 

75.2881 

93.0472 

93.0843 

93.1498 

p+* 

^EB 

33.2123 

59.5154 

74.6257 

93.0472 

93.0843 

93.1498 

^PTE.IO 

53.2020 

95.1617 

100.376 

95.2410 

95.2410 

95.2410 

^MEB.IO 

40.2773 

72.3558 

80.3389 

93.0472 

93.0843 

93.1498 

f>+* 

^MEB. 10 

40.2773 

72.3558 

80.3389 

93.0472 

93.0843 

93.1498 

^PTE.OS 

46.3677 

95.2423 

104.812 

95.2410 

95.2410 

95.2410 

^MEB.05 

39.2176 

77.8064 

85.5323 

93.0472 

93.0853 

93.1498 

^MEB. 05 

39.2176 

77.8064 

85.5323 

93.0472 

93.0843 

93.1498 

^PTE.Ol 

38.6651 

95.4421 

124.036 

95.2410 

95.2410 

95.2410 

^MEB.Ol 

37.2853 

87.1063 

109.696 

93.0472 

93.0843 

93.1498 

^MEB. 01 

37.2853 

87.1063 

109.696 

93.0472 

93.0843 

93.1498 

Note.  For  i/  — (3  516402  5),  during  the  1,000  replications, 
the  null  hypothesis,  and  =0{k  = 1,2,3),  was  rejected  in  every 

replication  for  all  a =.10,  .05  and  .01.  Hence,  e^prsa  = 

^kMEBa  - ^EB  and  e^MEBa  = ^EB‘ 


86 


model  in  (3.1)  and  the  model  in  (3.3).  The  domination  is  more  heavily  pronounced 
in  the  case  when  = Q- 


EMPIRICAL  BAYES 


CHAPTER  4 

ESTIMATION  OF  MULTIVARIATE 
MODEL  II 


REGRESSION 


4.1  Introduction 

To  estimate  a multivariate  linear  regression  model  for  m distinct  individuals, 
there  may  be  a choice  between  the  following  two  models: 


Yj,  - (4.1) 

and 


Yjc  — KkiP^^+sj,, 


(4.2) 


where  F^’s,  X’s  (f  = 1,2),  0^^ 
Section  3.1.  Estimators  of 
of  general  quadratic  loss. 


^ ICt’s  are  the  same  as  defined  in 

’s,  once  determined  can  be  compared  on  the  basis 


- X,0J,  (4.3) 

where  ^’s  are  some  known  p.d.  matrices. 

In  this  chapter,  it  is  assumed  that  in  (4.1)  and  (4.2),  X^  = XandV^  = V 
A:  = 1,  • • • , m.  The  g-prior  distribution  of  the  coefficients  ^^’s  is  assumed  here  as 


ind 


N{Uk,Y),k  = I,- ,m,  ^4 

with  C = ( ■^‘  L and  S = r^X^V-'X)-', 

where  represents  the  values  of  the  concomitant  ‘background’  regressor  variabies 
associated  with  the  hth  individual.  If  is  an  r x m ‘across  indi- 

vidual’ design  matrix  of  constants  of  full  rank  r < m,  then  C,  is  a p.  x r matrix 


87 


88 


of  unknown  parameters.  It  should  be  noted  that  D is  a pxr  matrix  and  p > p^. 

The  structure  of  the  prior  mean  displayed  in  (4.1)  is  modified  from  the  one  given 
in  Reinsel  (1985). 

The  empirical  Bayes  {EB)  estimators  of  {XI)*  = {{X0  )^,  • • • , (203  )^)^  will 
be  introduced  according  to  the  above  prior  in  Section  4.2.  The  EB  estimators  of 
serve  as  a weighted  average  of  the  LSE  of  based  on  the  model  in  (4.1) 
and  the  weighted  average  of  LSE's  for  the  reduced  model  (4.2)  for  A:  = l,---,m. 
If  a F ratio  is  defined  from  the  hypothesis  that  Fq  : /?,  = D.au  and  /?  =0 

k = l,...,m,  the  empirical  Bayes  estimator  of  leans  more  towards  for 
all  k if  the  observed  F ratio  is  large,  and  to  the  weighted  average  of  the  reduced 
LSE's,  if  the  observed  F ratio  is  small. 

A popular  method  of  achieving  the  compromise  estimator  is  the  use  of  pre- 
liminary test  estimators  {PTE)  based  on  the  rejection  or  acceptance  of  the  null 
hypothesis  Hq.  It  is  shown  in  Section  4.2  that  for  every  PTE  of  {XJ3)*,  there  is 
a corresponding  modified  EB  estimator  which  dominates  the  PTE.  This  is  not 
a surprising  consequence,  since  the  PTE  does  not  take  into  account  the  degree  of 
evidence  for  or  against  the  null  hypothesis  Hq  : and  = 0 for  all  k. 

Section  4.2  also  introduces  the  positive  part  versions  of  the  EB  estimator.  Sev- 
eral theorems  are  given,  which  describe  the  frequentist  as  well  as  the  Bayesian 
properties  of  the  proposed  estimators.  Section  4.3  introduces  several  estimators 
§.ki-  These  estimators  are  proposed  in  such  ways  that  they  shrink  the  unre- 
stricted LSE,  to  the  restricted  LSE,  0^^,  or  a certain  weighted  average  of 
Under  the  squared  error  loss,  their  frequentist  risks  and  Bayes  risks  are  com- 
pared. It  is  shown  that  there  are  EB  estimators  of  {Xfi)*  which  always  dominate 

^LSE  = under  squared  error  loss.  Finally,  in  Section  4.4, 

as  a multivariate  regression  example,  different  estimators  of  {X^*  are  compared 
according  to  their  simulated  frequentist  and  Bayes  risks. 


89 


4-2  The  Empirical  Bayes  Approach 

In  order  to  develop  an  EB  procedure,  a Bayes  estimator  of  for  ik  = 1,  • • • , m, 

should  first  be  obtained.  Based  on  the  prior  distribution  in  (4.4),  the  posterior  of 
§.  given  Fjt  is 


m. 


(4.5) 

where  0^  = is  the  BLUE  c!  0^,  and  B = <rV(<r=  + r^).  By 

~T  ~T  ~T 

expressing  ^ayes  estimator  of  under  any  quadratic  loss 

(4.3)  is 


^kB{Yj,)  = E{XljY,)=X,[D,a,  + {l~B){l^^-D,g^)] 
+ (1 --B)2[2^2,  forA:  = l,...,m. 


(4.6) 


In  an  empirical  Bayes  framework,  the  parameters  D^,  and  are  unknown, 

and  need  to  be  estimated  from  the  marginal  distribution  of  • • ■ ,Y^.  Note  that 
marginally 

Yj,  NniKiDia,,,  aV  + t^Px),  (4.7) 

where  = X{X^V-^X)-^x’^ , By  using  Exercise  2.9,  p.33  of  Rao  (1973),  and 
recalling  B = + r^),  it  can  be  shown  that 


(aV  + r^P^)-i 

= o-^[V-^~{l-B)V-^^V-^].  (4.8) 

Now,  let  a,  = Xjv-^X,  a,,  = a,  - a,e;i'a.  l.J  = 1.2).  using  (4.7)  and 
(4.8),  the  marcimum  likelihood  estimator  of  is  given  by 

£1  = (xfv-‘x,r'x^v-'Y*^Au^Ar' 

= irA(AU)-\ 


(4.9) 


90 


where  Y*  = and  = (4., ■ ■ • aa  = (XfK-X.)-' 

K.iYL~^Yj,.  Recall  = (oi,  • • • , a^). 

Then,  based  on  the  normal  equation 


—1^  ~ + ^12)^2’  (4-10) 

the  following  identity  can  be  obtained 

~ ^kl  ~ ^n^i2§.ic2'  (4-11) 

Now,  consider  F*  - X,D,g^  = xl^  -X,D,a,  +Y,  -Xfi^.  Since  P^V~^X, 
= K.X  [i  = 1,2),  by  using  the  idempotency  oiV_~^P^,  and  (4.8), 

OU  - KiDia^f{o^YL+r'^Px)~^[Y^  - X^D^a^) 

= - D,Si)  + | 

+ (n  - 2c|j’'y-‘(y*  - 

Thus,  adding  all  the  k components  of  the  above  equation,  and  using  the  identity 
(4.11)  , one  gets 

m 

- KiD^g^f{a^V  + T^P^)-^{Yk  - X^D^a^) 

m 

= a-^SSE  +BY.\ - £,a.) 

*=1 

^ T 

^^fc2— 22.1^jt2])’’  (4.12) 


m 

SilOLk) 

m 

= WfenfiOa,  - 

m ^ 

= f(  (£fen£.)  E Star  I - 2tr  [£fc„  E hiHl  I 

*=1  *=1 

m 

+ '^lenEiA.I 


where 


91 


= tr  [ {D^C^^D.,)A^ A - 

+ 

= tr{{D,  - fjA{A^A)-YC,^{D^  - A{A^ A)-^) A^ A) 

+ )C,,  ].  (4.13; 

By  using  (4.9),  the  last  term  in  (4.13)  can  be  expressed  as 

tr\fJ{L  - A{A^A)-^A)f^C,,  ] (4.I4) 

= tr  [ (^f^  - ^^jAiA^A)-^^^)  )C„  ] 

= tr[  - D,A^) ) Cl,  ] 

m 

^ -£iOjt).  (4.15) 


Note  that  SSE  = Er=i(r* -^),  the  usual  error  sum  of  squares. 
Let  SSH,  = Er=i(^i  -4«*)^  Q.n{h,-D,a,),  and  SSH,  = 

The  summation  of  SSHi  and  SSH^  is  the  sum  of  squares  due  to  these  hypotheses 

-^01  • ^,.1  — D_^a^  for  all  k and  Hq2  ' = 0 for  all  k.  From  (4.9),  (4.12),  (4.13), 

and  (4.15),  it  is  easy  to  see  that 

(£1  ,SSHi  + SSH2,SSE)  is  complete  sufficient  for 


From  (4.15), 


SSH,  = tr[0*  U~  A(A^A)-'A)0*Ci^] 

= tr  I - 4U^4)-U)|f  <2lf  I, 

1/2  #r 

where  C,,  ~ (r^  + o^)Lp^  ® Lm),  and  (/  - 


IS 


idempotent  with  rank  = m-r.  Hence, 


Cli'lt\L-A{A^A)-^A)l*cli^ 

Wishartp^{{T^  + t^^)4,,m  - r,0). 


(4.16) 


By  using  (4.16),  one  gets 


92 


(4.17) 

Due  to  the  fact  that  are  indepedent  of  (see  Lemma  2 in  Ghosh, 

Saleh,  and  Sen,  1987),  and  SSH2  ~ {T^  + o^)Xmp^,  it  is  evident  that  SSHi  + SSH2 
(t  + a )Xmp-rpi'  UMVUE  of  a = (r^  + is  then  equivalent  to 

(mp  - rpi  - 2)/{SSHi  + SSH2)  for  {mp  - rpi)  > 3 and  p = + p^.  The  best 

scale  invariant  estimate  of  is  SSE/{N  - mp  + 2),  where  N = Er=i  rik,  and  the 

MLE  of  Di  is  0^  Substituting  all  these  estimators  for  the  unknown 

parameters  in  (4.6),  gives 


^kEB 


— K.i{D^o^  + (1  — 


mp  - rpi  - 2 


, ,,  mp  — rpi  — 2, 

+ J )Z2^*2,  A:  = l,-..,m,  (4.18) 

where  F = [N  -mp-\-  2)(5S'i7i  + SSH2)!SSE,  a constant  multiple  of  the  F ratio 
for  testing  Ho  : 0^^  = D^oj,  and  0^^  = 0 for  all  k. 

In  practice,  it  is  much  more  appropriate  to  use  the  positive  part  EB  estimator, 

-EB  {^lEBi  ■ ■ i^mEB)  ) because  B is  estimated  by  the  quantity  (mp  — rpi  — 2) fF 
which  can  be  greater  than  1. 


^kEB  (i^) 


V ^ fi  1/1  m,p  rpj  2.  I 

- 2Li(£i^  + (1 ^ + I 

mp  - rpi  - 2, 


nL-Aa,» 


+ (I - 


m 


(4.19) 


where  o'*"  = max(a,  0). 

Note  from  (4.19)  that  for  very  large  F values  signifying  substantial  departure 
from  Ho  . 0^^  = Dia,,  and  0j^^  = Q for  all  k,  c^eb  is  very  close  to  whereas  for 
very  small  F values  signifying  enough  support  for  Ho,  c+eb  is  very  close  to  X,D,a, 


93 


for  all  k.  When  there  is  no  clear-cut  decision  for  or  against  e+^B  is  in  some 

sense  a weighted  average  of  X,D,Oj,  and  with  the  weights  being  adaptively 

determined  by  the  data. 

Next,  the  estimators  ‘ ‘ and  e^*g  will  be  evaluated  accord- 

ing to  the  criteria  of  frequentist  and  Bayes  risks.  First  the  frequentist  criterion  will 
be  utilized. 

The  general  class  of  estimators  of  is 

= X^[D^U  + (1  - ~ =^1^)) 

+ (1  - A:  = (4.20) 

The  following  theorem  provides  the  risk  expression  for  ^ = (ef^,  • • • under 

the  loss  (4.3)  with  Q = The  risk  is  denoted  by  R{2Cj3^,  • • ■ , XJ3  ; • • • , e ^) 


Theorem  4.2.1.  If  <?!.  is  a differentiable  function,  then,  under  the  loss  (4.3)  with 

Q = v-\ 


_ rpi)  + - ^)F] 

p2^  + SSH2)}. 


(4.21) 


Proof  of  Theorem  4.2.1.  By  using  (4.11),  it  can  be  shown  that 


(4.22) 


94 


Then, 


m 

~ ^{^k<t>  ~ =^jt) 

U_1  * 


*=1 

m 


= E - E(l^)  - - b,a,)  ] 


+ Ci,,-0 


k2 


^3  I’-c  (5  fl  u 

l±k2f  ^2211^2  


= mpa“  - f [ - £(^  )) 

i=l 

+ ils-^^AL-ij]} 

,<i>‘{F)  "• 

ib=l 


j2/p\  m 

+ £[  (^1  - D^a,fC,,0  - D,g^) 

^ k=i 


+ llc2^22.lli,2^^- 

A generalization  of  Stein’s  identity  (Berger  and  Half, 1983)  gives 

E(  (L  - s.a.)^e.i(^,  - E(Pj) 


(4.23) 


^2 


= <7^b»  „«  „.p(£(f)  .?gg.  + ggg. 


S(F) 

+ Pi  + ”2P2)}. 


F2  ^55^;/(AT  — mp  — 2) 


(4.24) 

Recall  that55ffi  = Sr=i(^i-4^)^  Cii(^^^-4^),  = T,7=if,2^-^^-ik2^ 

F - {{SSHi  + SSH2)/SSE){N  - mp  + 2).  Since  R[Xfi XJ3  \ 

'"'^^rn)  ~ ”2pa^,  (4.21)  can  be  obtained  by  combining  (4.23)  and  (4.24).  The 
proof  of  the  theorem  is  complete. 

The  following  corollary  to  Theorem  4.2.1  provides  sufficient  conditions  under 
which  4 dominates  el^E  = • • • , . 


95 


Corollary  4.2.1.  Suppose  Q — ^ and  mp  — rpi  > 3;  (f>{F)  is  non-decreasing  in 

F , and  0 < <^>{F)  < 2{mp  — rpi  — 2).  Then, 

Proof  of  Corollary  4.2.1.  Corollary  4.2.1  can  be  obtained  immediately  by  fol- 
lowing the  proof  of  Corollary  3.2.1. 

Note  that  under  the  loss  a ^L{2C^^,  ■ • • Qcii'  • ■ ^lse  constant  risk 

minimax  estimator  of  {K§)* . Therefore,  any  estimator  dominating  is  also  a 
minimax  estimator  of  {2L^* • Thus,  Corollary  4.2.1  provides  useful  minimax  estima- 
tors of  Also,  it  shows  that  e*^g  dominates  and  is  a minimax  estimator  if 

g = 11“^  for  being  a special  case  of  with  4>{F)  = mp-rpi  -2.  It  is  clear  from 
(4.21)  that  when  (f>{F)  = c,  a positive  constant,  then  • • • , 

is  minimized  when  c = mp  — rpi  — 2.  Thus,  for  Q = is  optimal  within  the 

class  of  estimation  of  the  form 

KiiRiO^  + (1  - - RiOj,))  -f-  (1  - — A:  = l,---,m. 

It  will  be  shown  later  that  the  Bayesian  opatimality  of  holds  within  the  same 
class  of  estimators  irrespective  of  any  p.d.  matrix  Q. 

It  is  not  necessarily  true  that  dominate  = {{K^D.aif , • • • , 

By  expressing 

= Ki{D,a„  - E(D,a,))  + X,{E(D,a,)  - E{0^^)) 

it  is  easy  to  check  that  in  estimating  ( X0)*  has  the  risk 

■ ■ ■ , 

= E <>•  I Q-nV  (Sia.)  I + f;  E0^^  - - D^a,) 

*=1  *=1 
m 

E ^2— 22  1^*2’ 


(4.25) 


96 


where  \Q.\i^  {h.iQuc)\  — c^^Pi  EtLi  a*  = (^'^Pii'''[A[^  A)~'~  ^)  = 

rpio^. 

Also,  from  (4.21)  with  (f){F)  = mp  — rpi  — 2,  it  can  be  obtained  that 

^(^1’' “ ' ’^'^eb)  = rnpa^ 

- 2(7^ (mp  - r Pi  - 2)^ F (^)  + (mp-rpi- 2)^ (4.26) 

For  fixed  §_ii ' ' ‘ §_^i  and  o , the  variables  SSHi,  SSH2,  and  SSE  are  indepen- 
dently distributed  with  (SSHi  + SSH^)  ~ x^p-rp,(A),  A = (l/(2a2))  (^^^^ 

~ ^l^kV  C_n  E{§_^^  - Dia^)  +Er=l  ^12^221  ^k2^'  ~ ^^Xh-mp- 

It  follows  from  (4.26)  that 


R(Kl^,---,Xl 

■ ■ ■ ^?.mEB) 

.2  _2/ „n2/  N-mp 


= mpo’^  - (mp  - rpi  - 2Y (- 


‘ JV  - mp  + ' ’■P'  - 2 + 2A:)-‘,(4.27) 

where  K ~ Fots5on(A).  Thus  from  (4.25)  and  (4.27),  as  A -+  0,  R(Xfi^,  • • ' , Kfi  ; 
Kitiai,  • • . , Xitia^)  ->  rpia^,  in  which  R(Xfi^,  • • • Cieb,  • • • ,e^EB)  ^ 

[mp  - (mp  - rpi  - 2)(N  - mp)l(N  - mp  + 2)]  > rpia\  On  the  other  hand,  as 
A ->•  00,  R(XA^,--',  XiRiCi,---,  Xihia^)  ->■  00,  but  R(2CA^,  ■••,X0  ; 


^lEBy  "'ilmEB)  mpa^ . Hence  neither  domintates  each  other. 

The  above  phenomenon  is  expected.  Small  A indicates  that  is  close  to  DiQ^, 
and  is  close  to  0 for  all  k,  in  which  case  clearly  e*Ef^EE  is  the  desired  estimator. 
On  the  other  hand,  large  A indicates  substantial  rejection  of  Hq  ; = DiP^  and 

§^2  ~ — ^ii  i^  which  case  the  model  (4.1)  is  expected  to  perform  well.  The  EB 
estimator  turns  out  to  be  fairly  robust.  Note  that  is  not  a minimax  estimator 

of  (XIY.  Since  R(Xl,,-  ■ • , • • • , Xitia^)  00  as  A ^ 00,  from 

the  robustness  criterion,  §Ab  is  evidently  the  superior  over  ^ise  ^rmle  when 


<?  = E~^ 


97 


However,  the  estimator  is  obtained  by  estimating  B,  the  Bayes  shrinking 
factor,  by  a random  variable  which  can  be  bigger  than  1 with  positive  probability. 
As  mentioned  earlier,  this  deficiency  is  remedied  by  For  the  every  estimator 
defined  in  (4.20),  there  is  a corresponding  estimator  e^*  given  by 

+ = (4.28) 

It  can  be  expected  that  hcis  smaller  risk  than  e^. 


Theorem  4.2.2.  Under  the  loss  (4.3)  with  Q - V~\ 

A® . (4.29) 

Proof  of  Theorem  4.2.2.  After  some  algebra,  one  gets 

^ , ■ ■ ■ . fa*,  ■ • • , ■ . . , ef*,  • ■ ■ , ei*) 


4.{F] 


k=l 


- Sifa,)  - fiio*)  + 


(X,&st  - - aat)  + 


F 


(Zi^ia*  - KlfV_-^{Xib^g^  - 


Using  the  identities 


— X0^  — Xiibig,!^  — E[b\Q^k))i 

+ x,(E(bia„)  - E(0j)  + (icr.‘a,  - 

and 


KiD,^-2Ci, 


Xiibi^k  ^.i)  + {XiQ.iiQn  ~ X2)0ijn> 


(4.30) 


(4.31) 


(4.32) 


98 


and  the  mutual  independence  of  with  (DiOi  • • - , , SSH2,  and 

SSE.  Recall  that  SSHi  = ~ ~ Q-iQ^k)  s^nd  F is  a function 

of  SSHi,  SSH2  and  SSE. 

The  right  hand  side  of  (4.30)  can  be  expressed  as 

m 

X/  P Qii{Diah  — 

k=l 

+ ^2^22.1^i2Hl  - (4.33) 

Hence,  it  suffices  to  show  that 

m 

+ S.2^22.xh2\^SH,  = h„ 

*T 

^2— 22.i^jt2  = ^2it,  for  all  A;  = 1,  • • ■ , m, 

SSE  = > 1}  > 0,  (4.34) 

for  all  hi  > 0,hik  > 0 and  e > 0.  Since  (^[^  • (3^2  ’ ' 

mutually  independent,  (4.34)  can  be  then  written  as 

m 

E Ep_^,..,p_^,,.{{{E[Di^  - l^YCiiitia,,  - l^;)\SSHi  = hi) 

(^*2— 22.1^21^*2— 22.i^i2  ~ ^2*)}  > 0,  for  all/li,/l2*  > 0.  (4.35) 

From  the  proof  of  Theorem  2.2.3,  it  can  be  shown  that 

m 

E 221^*21^*2— 221^2  ~ ^2*}  > 0,  (4.36) 

*=1 

for  every  /12*  > 0. 

Recall  SSHi  can  be  expressed  as 

[ CK^f ’'U  - A(A^A)-'£)'f,C}J^^  |. 

Thus,  the  former  term  of  (4.35)  can  be  shown  as 

^^5.,  ■■■,«., ,>{<r(c|«£0f’')(/  - A{A^Ar'A^)0*cl{^\ 

\>r  a- A(A^Ar^A^)it  £.]{']  = '>.}■ 


(4.37) 


99 


Now,  let 

(^1)  ■ • • j^m)  = ^11  (^11’ ■ ■ ■ (4.38) 

The  distribution  of  («!,•••, is  There 

exists  an  orthogonal  matrix  such  that 


P^U  - A{A^A)~^A^)P  = 


2(m-r)xr 


Qrx(m-r)  Qrxr 

From  (4.38)  and  (4.39),  the  latter  part  of  (4.37)  can  be  expressed  as 


(4.39) 


i-m— r 


2(m— r)xr 


2rx(m-r)  QrXr 


(4.40) 


where  (^^,  ■ • = (aj, . . • ,a,„)Pand  is  distributed  as 

/m)-  Now,  (4.40)  can  be  rewritten  as 

. / II,  . 

} 


tr{ 
= ir{ 


4>i 


[^1,- 

’ -I-m^ 

i-m—T 

Qrx(m-r) 

Q(m-r)xr 

Qrxr 

) = E 


(4.41) 


Q(m-r)xr 
2rx(ni-r)  Qr 

where  U = Diag{f^^^,  • • • 

Since  ^^’s  are  independent  normal  vectors,  each  has  the  mean  equal  to  the  A:th 
column  of  Cj(  )P  and  the  variance-covariance  matrix  equal  to  /p^,  and  from 

(4.41),  (4.37)  can  be  then  expressed  as 

m—r 


Eiwjr^jEfe 

*«i  y=i 


m—r 


= Y:\(E{±y)?±,\£ij,\>o. 

*=l 


(4.42) 


The  previous  statement  again  is  based  on  the  argument  in  Theorem  6.2  of  Lehmann 
(1983).  Theorem  4.2.2  is  hence  proved  as  a consequence  from  (4.36)  and  (4.42). 

Finally,  based  on  the  hypothesis  Ho  : and  = 0 for  all  k,  a general 

PTE  is  given  by 


hpTE  - XiCCiSi  + - &a*))  + 22s(f)|„, 


(4.43) 


100 


where  j(F)  — I[F>d]  with  d,  a postive  constant  depending  upon  the  chosen  level  of 
significance,  and  I the  usual  indicator  function.  The  corresponding  modified  EB 
estimator, 

+ (1  - “ ^1^)) 

+ ^2(1  - A:  = l,-..,m,  (4.44) 

dominates  — {^ptei  ' ' ' ^^pte)^  under  certain  conditions. 

Theorem  4.2.3.  Consider  the  loss  (4.3)  with  Q = V~K  If  0 < <po{F)  < 2{mp  - 
^Pi  “ 2),  then 


R{K^^ , • • • , K§_J,  ^IMEB . • • • ) ^MEb) 

m'i-iPTEi' " i^mPTE)-  (4.45) 

Proof  of  Theorem  4.2.3.  By  using  (4.21),  and  following  the  procedure  in  Proof 
of  Theorem  2.2.4,  (4.45)  can  be  obtained  immediately. 

Note  that  Stlfpg  estimates  by  if  g(F)  = 0,  i.e.,  the  null  hypothesis 

is  accepted  at  a desired  level  of  significance,  and  it  estimates  by  an  type 
estimator  as  described  earlier  in  this  section  when  the  other  is  true.  Also,  the 
modified  EB  estimator  ^MEB  — i^iMEB> ' ' ' ^^t^EB)^  further  improved  by  a 

positive  part  modified  estimator  = UtllEB,  • • '^itSfEB^  as 

= ^.(ao,  + (1  - - D,a,)) 

+ ^2(1  - k - (4.46) 

Now,  e*Eg  and  other  associated  estimators  of  (X^)*  are  compared  by  the  Bayes 
risk.  In  view  of  Theorem  4.2.1,  the  Bayes  risk  of  e*pp  is  smaller  than  that  of 
when  Q = V~^  without  regarding  of  any  prior,  while  always  dominates 
irrespective  of  any  prior.  However,  the  Bayes  risk  optimatity  of  e*pp  over  is 


101 


not  limited  to  Q — K . The  following  theorem  will  show  this  superiority  of 
over  e*j^sE  under  the  prior  presented  in  the  beginning  of  this  section. 


Theorem  4.2.4.  Consider  the  model  ~ for  A:  = 1,  • • • , m, 

where  V is  p.d.,  and  ~ iV(|^  j V~^ X)~^)  for  all  k.  Let  ^ denote 

this  given  prior,  and  r(^;  e^,  ■ • • ,e^)  the  Bayes  risk  of  an  estimator  (s.i,  • • • ,e^)  of 
• • • ,2C^^)  under  the  prior  ^ when  the  loss  is  given  in  (4.3).  Then,  it  is  clear 

that 

- ••  iCmEB)  < ‘ • (4-47) 

Proof  of  Theorem  4.2.4.  Let 

~ f(tki  - D\^))  + (1  - -p)X2§_^^,  A:  = 1,  - ••  ,m.  (4.48) 

According  to  the  loss  (4.3), 

^ ( ^ I ® Ic ) ■ ■ ■ ) ^mc  ) 
m 

= E ‘H  {E(l,,  - a,,  - - c.o.)) 

- j;(i,  - ) 

m 

+ Z H {Ell,  - 1,1  -fill-  Eiii)) 

(L  -&1-  jLf)2LlQ2Ci  I 

m 

jfc=i  ^ 

(It,  - fin  - ^(iti  - b,it)f}XlQ^X,  I 

+ ^/'lE:(l,-Li-jlMt,-it,-jL,yulQKi\,  (4.49) 

in  which 

E:ili-iti  - jCiti  - biotml,  - ^Il^-b,^)f 
- ^(jdi  - itiMti  - bi^f)  - E(jil,- 


and 


102 


= o 


F 

2 c{N  - mp) 


E{- 


iV  - mp  + 2 ^ SSHi  + SSHi 

iL-D,a,n 


(kn-E(L,\^)) 


(4.51) 


By  using  (4.5),  (4.51)  can  be  expressed  as 

1 


- my) 


N-mp  + 2 ^SSHi  + 

+ [DiO^  - DjUi) - SiOjt)^}) . 


{{K-D,a^){K-D,a,) 


(4.52) 


By  using  (4.11),  the  independence  between  and  the  independence  be- 
tween and  SSHi,  and  that  between  and  - Dig,.,  and  also  ED^ 

= ^1,  from  Basu’s  theorem,  and  using  Lemma  1 in  Ghosh,  Saleh,  and  Sen  (1987), 
Equation  (4.52)  equals  to 

a^B 

N~n 

c{N  — mp) 


c(iV-mp)  Vij^^-biak)  CnCi^VjljC^iC-l 
N ~mp  + 2^ E{SSHi  + SSHi)  E{SSHi  + SSHi)  ^ 


= a^B 


{Cn.2-{aliA^A)  '^)Cr/),  (4.53) 


{N  — mp  + 2)(mp  — rpi) 
According  to  the  same  argument  in  (4.53), 


^i^iL-Dra,)0^,-b,a,)^) 

= ~ ^P)  i^n.2  - id(A^A)-^gj,)C:[i^) 

N - mp  + 2 [mp  - rpi)  (mp  - rpi  — 2) 

By  combining  (4.50),  (4.53)  and  (4.54),  the  first  term  of  (4.49)  becomes 


(4.54) 


m{cr  [ Gii^2(^fQ2Ci)  ]}  + (- 


- 2c)o^B- 


N — mp 


mp  — rpi  — 2 ^ [N  — mp  -|-  2)  (mp  — rpi) 

tr[{[m-  r)C-'  + QX,  ].  (4.55) 


Now,  applying  Basu’s  theorem  and  Lemma  1 in  Ghosh,  Salen,  and  Sen  (1987), 

\T 

„2 


F- 

c 


= - E{PJY,)f)  + E{^0jl] 

N — mp 


= ^'C;2\  + { 


mp  - rpi  - 2 


- 2c)a^B 


[N  — mp  + 2)(mp  — rpi) 


Q.22.1-  (4.56) 


103 


From  (4.56),  the  last  term  of  (4.49)  is  identical  to 


I Cn'AXf  QX,)  1}  + ( 

— mp  - rpi  — 2 

tr[mC;,\XlQ^X,]. 


2c)a^B 


N — mp 

[N  — mp  + 2)  [mp  — rpi) 


(4.57) 


By  using  the  same  argument,  the  cross-product  term  becomes 


^(^*1  - L.  - jiL  - ^xa*))(^,,  - 0,^  - 
= - E[j0^^  - 4a*)(^2  - E[0jY,)f) 

— + E[^iK  - 

— (Sir2^12^22^) 

— ( 2c)a^B ^ ~ r~^r  r'~^ 

mp  — rpi  —2  [N  — mp  + 2)  [mp  — rpi)  — 12— 22.1- 


(4.58) 


From  (4.58),  the  cross-product  term  in  (4.49)  can  be  expressed  as 


{ X72a  tr  [ ]} 


- ( 


mp  -rpi  —2 
2 


- - 2c)a^-B 


° (AT  - mp  + 2)(mp  - rpi)  ^ 


- rnahr[C^,\C,,Cl,^XlQX,  ] - ( 


mp  - rpi  — 2 


- 2c)a^B 


N- 


mp 


tr  [ 1^0^22  Q.2iQ.ii,2X.i  0^X2  ]}• 

Finally,  by  combining  (4.55),  (4.57)  and  (4.59),  it  can  be  obtained  that 


[N  - mp  + 2)  (mp  - rpi) 
(4.59) 


X’(^)  ®lc>  ■ ■ ' 5 ^inc) 

= mahr  [ [X^ YT^ X)~^ X^ QX\ 

+ ( 

mp  — rpi  — 2 [N  — mp  -f  2)  (mp  — rpi) 

[tr  [ mGC^l^  ] + tr[[m-  r)C:,^X^QX,  ]),  (4.6O) 

where  G - (XjCj/Cij  —K2)^Q^  (r^i:£i/Si2“=K2)-  From  (4.60),  it  can  be  concluded 
that  r(^;  Cie,  • • • ,e^^)  < r[^;2C0^,  • • • ,X0_^,  when  0 < c < 2(mp  — rpi  — 2).  Since 
■ ^EB  if  c = mp  — rpi  — 2,  Theorem  4.2.4  is  proved. 


104 


The  next  theorem  gives  necessary  and  sufficient  conditions  under  which  the 
Bayes  risk  of  is  smaller  than  that  of 

Theorem  4.2.5.  Consider  the  set  of  Theorem  4.2.4.  Then, 

(4-61) 

if  and  only  if  (l  - Bf  / > 1 - (mp  - rpi  -2){N  - mp) /[(mp  - rpi)  (iV  - mp  + 2)] 
with  B = /{a"^  + r^). 

Proof  of  Theorem  4.2.5.  By  expressing 

Kl^lQ,k  — — Ep  ...^p  ,a'^{D.\&k)) 

— 1 — m 

+ Ki{Ep  0 2{big^)  - 

it  can  be  seen  that 

^{X^i,  ■ • • ^ X^^  \ XiD-iOli,  • • • ,Xibia^) 

m 

= E{f|i'(aat)(zr2z,)i 

t=i 

+ 2tr[^^^E{b,a,  - {XIQ{XiQxIC,,  - X,))  ] 

+ (4.62) 

where  G_  = [XiC_^iQ^i2  ~ X2)^0^  iXiC^i  C_i2  — ^2)-  Since 

2.  rHK^V-'X)-%  (4.63) 

then, 

^Ri^ir  -.D^a„,T^{^k2El^.-  -,0^{Diak  ~ 

= Cov{E0^^...jjb,ak  - lki)^^k2) 


105 


= Covlig^  + - U),0*^U) 

~ 11  — 12— 22.1  ^ ~ l) 

= Q>  (4.64) 

where  ^ z = 1, 2 and  1*  is  the  A;th  column  of  7„.  And 

^D,a„-,D,a^,T^{E{Dia^  - 

= V{{§*^  - )(1,  - A{A^A)-\,)) 

= r^(l  - al{A^A)-^a,){C-,\  - C~,^ 

= ^{A^A)-^Oj,)C;l.  (4.65) 

By  using  (4.62), (4.63),  (4.64),  and  (4.65),  it  can  be  obtained  that 

2C.l^lQii,  • • • i 2Li^ia,n) 

+ r^{m-r)tr[C:[,^XjQX,] 

+ mir[C:^l^G].  (4.66) 

Now,by  using  (4.60)  with  c = mp  - rpi  - 2 and  (4.66), 

^(£)2Ci£iai,  • • • ,XiDia^)  — r[^,6i^g,  • • • ,e^Eg) 

= (r^  - C.2  I ~ ~ ~ ) 

{N  — mp  + 2)  (mp  — rpi) 

{(m  - r)tr[C_XlX{qKi  ] + mtr  [C^/iG]}.  (4.67) 

Thus,  the  conclusion  of  the  theorem  follows  from  (4.67). 

The  consequence  of  the  above  theorem  is  clear.  If  < r^,  then  .B  < 1/2  in 
which  case  (1  — B)^ / > 1.  Therefore,  always  dominates  in  its  Bayes 

risk  if  the  sampling  variability  is  smaller  them  the  prior  variability.  However,  if  r* 
is  much  smaller  than  a^,  e)^MLE  becomes  a more  appropriate  estimator  of  {XfiY- 


106 


A general  class  of  minimax  estimates  of  [X^* , which  was  described  earlier  in 
this  section,  can  be  expressed  as 

= Kk\{D^yak  + (1  ~ ~ + X2{1  — ^ — 1)  • • • ,?t^,(4.68) 

where  c is  a proper  constant.  The  following  theorem  shows  that  is  the  optimal 
estimate  of  {Xp)*  within  the  class  of  all  estimators  of  the  form  e*  = fef  , • • • , 

Theorem  4.2.6.  Consider  the  same  set  up  as  of  Theorem  4.2.4.  Then, 

^lc5  ■ ■ ■ ) ^mc)  ~ IiBj  ■ ■ ■ ) 

= a^Bmtr[{X^V-^X)-^X^QK] 

+ ( 2c)a=fl, 

mp-rpi-2  {N  - mp  + 2){mp- rpi) 

{mtrlGC^h  ] + (m  - r)tr  [ ]}  (4.69) 

For  mp  — rpi  > 3,  the  above  risk  is  minimized  at  c = mp  — rpi  — 2. 

Proof  of  Theorem  4.2.6.  From  the  proof  of  Theorem  2.2.8,  it  can  be  shown  that 

• . ■,e^B)  = ~ B)mtr  [ [X^  X)-^  X^  QX\  (4.70) 

Then,  Eq.(4.69)  can  be  proved  by  combining  (4.60)  and  (4.70). 

4.3  Empirical  Bayes  Subset  Estimators 

In  this  section  four  empirical  Bayes  subset  estimators  of  are  discussed  and 
compared  under  the  loss  in  (3.69), 

i(a»!  a*)  = (a,  - - ij. 

The  multivariate  model  is  the  same  set  as  given  in  Theorem  4.2.4. 

The  shrinkage  coefficients  corresponding  to  the  four  empirical  Bayes  estimators 
will  be  derived  from  the  usual  F ratio  for  testing  Hq  : = 0.  If  is  defined  as 


107 


an  estimator  of  for  some  k that  shrinks  the  unrestricted  least  squares  estimator 

to  the  restricted  least  squares  estimator  under  the  hypothesis  Hq  : = 0, 

then  can  be  expressed  as 

- 2n).  (4-n) 

where  F*  = (AT  - mp  + S SE. 

Now,  is  defined  as  an  estimator  that  shrinks  the  unrestricted  least  squares 

estimator  to  the  restricted  meiximum  likelihood  estimator  under  the  hy- 
pothesis Hq  : = D.iO:jc  §_j^2  ~ - shrinkage  coefficient, 

^{Fic)fFk,  is  the  same  as  described  in  then 

4*1  = hi  - ^^ihi  - ^1^)-  (4.72) 

Replacing  the  shrinkage  coefficient  <f>{Fk)lFk  by  <f>{F)/F  of  in  (4.71)  yields 
a third  estimator  as 

, ~ MF)  - 

(4-73) 

where  F = [N  — mp  + 2){SSHi  + S S H2) / S S E has  been  mentioned  in  Section  4.2. 

The  forth  one  is  a subset  of  described  in  Section  4.2.  Thus,  can  be 
expressed  as 

4*1 = hi  - ^ihi  - ^1^)-  (4.74) 

The  comparison  among  these  estimators  are  based  on  their  frequentist  risks  and 
Bayes  risks.  Now,  these  estimators  of  are  first  evaluated  for  some  individual  k 
according  to  the  Bayes  criterion.  Let  (f>{F)  = <f>{Fk)  = c,  a positive  constant,  and,  as 
a convenience,  rewrite  4*^  as  for  i = 1, 2, 3, 4.  The  following  theorem  shows  that 
under  the  prior  described  in  Section  4.2  the  Bayes  risk  of  with  c = mp-rpi-  2, 
i.e.,  the  subset  of  Cj^EBt  is  the  smallest  among  the  four  empirical  Bayes  estimators 
of  irrespective  of  any  p.d.  U_k. 


108 


Theorem  4.3.1.  Consider  the  prior  ( in  Theorem  4.2.4.  Let  denote  the 

Bayes  risk  of  an  estimator  e*.i  of  for  some  individual  k under  the  prior  ^ when 
the  loss  is  given  in  (3.69).  Then, 

= min{mm{r(e,^'ti)},mm{r(e,^^i)}, 


mm{r(e,^^i)} 

= max{mm{r(e,^\J},mm{(e,e,2*J}, 
mm{r(e,  mm{  ( e,  e^^i) }}. 


Proof  of  Theorem  4.3.1.  From  the  proof  of  Theorem  3.3.1,  it  can  be  obtained 


that 


[ Cil^U-k  ] + ( 1 ^ o ~ 


{N  -mp  + 2)p2  ^P2  - 2 


(4.75) 


which  is  minimized  at  c = pj  - 2.  The  minimum  of  is 


r(e,^p,_2)*i) 


^2^(iV'-mp)(p2  -2) 
{N  - mp  + 2)p2 


[ {Cii\ 


Cn^Uk]- 


(4.76) 


By  following  the  proof  of  Theorem  3.3.1,  using  instead  of  , and  consid- 
ering = (^'  + ^=')(l-an4"'A)-'ajt)Cri' 

and  the  independence  of  DiQ^,  and  it  can  be  shown  that 


N — mp 


(- 


-2c) 


{N  -mp  + 2)p2  'p2  - 2 
X [ ((1  -^{A^A)-^gj,)C:,^  + {C-,\  - C:,^])U,  I,  (4.77) 


109 


which  is  minimized  at  c — p2  — 2.  Thus,  the  minimum  of 

[N  - mp  + 2)p2 

X tr  [ ((1  - a*U^A)-'a*)C-i  + - C^,^))U,  ]. 

The  Bayes  risk  of  also  can  be  expressed  as 


(4.78) 


+ 


N — mp 


a B- oci 

{N  — mp  + 2)  (mp  — rpi)  ^ mp  — rpi  — 2 

^ ^^{{Q.11.2  ~ Q.n)lLk]^ 


(4.79) 


which  is  minimized  at  c — mp  — rpi  — 2.  Thus,  the  minimum  of  becomes 

X ^'''[{Q-11.2  ~ Q-\l)lLk]-  (4.80) 

By  using  (4.55), 


^2^  (iV  - mp)  (mp  - rpi  - 2) 
{N  — mp  + 2)  (mp  — rpi) 


N — mp 


+ a^B- 


:i- 


{N  - mp  + 2)  (mp  — rpi)  ^ mp  — rpi  — 2 

X ^r[((l  -a[(4^A)-^^)CiY  + (C-^2  - Cri'))C^l, 

which  is  minimized  at  c = mp  — rpi  — 2.  Thus,  the  minimum  of  *■(^,6^*1) 


(4.81) 


- rnp)  (mp  - rpi  - 2) 


(iV  — mp  + 2)  (mp  — rpi) 

X [ (1  - a[ (4"'A)"'^)Ci7  + {Cll^  - C-^^))U^  ].  (4.82) 


Comparing  (4.76),  (4.78),  (4.80)  and  (4.82),  Theorem  4.3.1  follows. 

The  condition  described  in  Theorem  4.3.1  is  clearly  expected.  This  is  because 
^*1  use  all  of  Fj,  • • • , in  estimating  as  well  as  l/(a2  + t^)  while  the  other 
three  estimators  use  only  F*  but  not  all  of  Fj,  • • • , F„,  for  estimating  either  or 
l/(cr^  + r^)  or  both. 


110 


If  c is  assumed  to  satisfy  the  conditions  of  that  0 < c < 2(p2  — 2)  in  and 
and  0 < c < 2{mp  — rpi  — 2)  in  and  it  is  then  easy  to  demonstrate 

that  all  these  four  estimators  of  (3^,  for  some  k is  better  than  B . 

Suppose  one  expresses 

+ cr/c  12^,3, 

and  gets 


= ohr{C:,^U,)  + tHt  [ [0^1^  - Cl^)lh). 


(4.83) 


Therefore, 


= I (gu  - g.‘)g>  I > 0,  (4.84) 

if  and  only  if  (l  - BY/B^  > 1 - {N  - mp){p2  - 2)  j {{N  - mp  + 2)p2),  and 


^ + ’’'^W^2)py  I (C-  - )£»  I > 0. 

if  and  only  if  (1  - 5)75*  > I - [N  - mp){p2  - 2)/((iV  - mp  + 2)^2). 
Also  based  on  the  same  procedure  given  above, 


(4.85) 


[N  — mp)  imp  — rpi  — 2)  , , , 


if  and  only  if  (1  - BY/B^  > 1 - (JV  - mp)(mp  - rp,  - 2)/((iV  - mp  + 2)(mp  - rp,)), 
and 


Ill 


= B 


{N -mp){mp-rpi-2).  ,,  t/  t ^ i ^ i 


+ 


[N  — mp  + 2)  (mp  — rpi) ' 

if  and  only  if  (l  - Bf  / >l-{N-mp)  [mp  - rp^-2)  / {{N  - mp  + 2)  (mp  - rpi)). 

The  results  of  (4.84),  (4.85),  (4.86),  and  (4.87)  are  not  surprising  since  is 
expected  to  perform  better  than  ej^i’s  (i  = l,2,3,4)  when  the  prior  variance  is 


much  smaller  than  the  sampling  variance  cr^.  However,  all  these  four  4*1 ’s  can  have 
smaller  Bayes  risks  than  if  > a^. 

Next,  the  frequentist  criterion  is  utilized  to  evaluate  the  four  estimators.  From 


Eq.  (3.94),  it  can  be  shown  that 


X etCu  S.2|j,/(SSE/(JV  - mp  + 2)) 

dy^iFiA  ~T 

The  frequentist  risk  of  can  be  then  obtained  as 

= -Ej.. 

- 2ir  I - 0J) 

HFk),!, 


X 


F,  '-*1 


(^*1  ^11^^12^1.2)  ^}£^* 


- - b,a,)  + 


(4.88) 


(4.89) 


112 


By  using  a generalization  of  Stein’s  identity  (Berger  and  Half,  1983),  the  cross- 
product  term  of  (4.89)  can  be  expressed  as 


Fk 

\T 


- - D,^f)/{SSE/{N  -mp  + 2)) 

^{Fk)  I _T  I aT  A\-l 


+ 


■(£^11.2  (^*(A  A)  ^ QLk)Q.n)}Uji 


(4.90) 


From  (4.89)  and  (4.90), 


n 


^ [SSEI{N  -mp  + 2)) 

+ 1 (cr.*,  - (a»(4^4)-‘a.)e,-i‘)a  I) 


+ - bmk) 

- 2(Cu&22„)''&(^,  - Cia*)  + ^,e2ieri'a.Cn  &2^.,)}- 


(4.91) 


By  using  the  same  argument,  it  can  be  shown  that 

= o^tr{Cii,2Ujc)  - 2<y^-^g^,  -,g^,an{2(-^  ^ - ^^) 

+ [SSEI[N  -mp  + 2))  + ^tr((C^,\  - Cr,^)t^)} 

+ ^4..  ^,.»=(^a22e^iSne*eri‘ei2^„), 


(4.92) 


and 


RiLv+i) 

= o^tr{C_A2lLk)  - - ^^) 


113 


((i,  - - b,a,)  - 2(1^,  - 

+ £^C^iC:;Uj,C:;C,,IJ/{SSE/{N  -mp+  2)) 

+ ^ir  I (Cfi  - (aj- U’'4)-‘a*)cri‘)&  |} 

+ ^£,. •■  ^,.-“{^((1..  - - b,a,) 

- 2(lt, -4at)''i£*£ri‘£i2^tj+ls,ai£ri‘e*Gu£i2^„}-  (4.93) 

Investigations  of  Eqs.(4.88),  (4.91),  (4.92)  and  (4.93)  indicate  that  there  is  no 
evident  winner  among  these  four  estimators  as  demonstrated  by  the  comparisons  of 
their  frequentist  risk  with  the  loss  defined  in  (3.69).  This  remains  true  even  though 
(f>{F)  = (f>[Fk)  is  equal  to  a positive  constant  c. 

4.4  Numerical  Examples 

To  illustrate  the  behavior  of  the  estimators  derived  in  the  previous  sections,  a 
Monte-Carlo  experiment  was  conducted  to  assess  the  frequentist  risks  as  well  as  the 
Bayes  risks  of  the  different  estimators. 

The  same  experiment  described  in  Section  3.4  is  used  here.  Random  samples 
of  m = 3 normally  distributed  random  vectors  [k  = 1,2,3)  of  dimension  16  x 1 
are  taken  in  a 2^  factorial  experiment.  Here  T*  is  the  vector  response  of  the  kth 
individual,  which  is  based  on  a 2^  factorial  experiment  replicated  two  times  (see 
Section  2.4).  It  is  assumed  that 

E{Yijk)  = 70*  + 7U^li  + l2k^2i  + 73*^3.- 

+ 7l2*flif2«-  + 7l3tfl.?3<  + 723*f2iT3.'  + 7l23*Clt  f2.C3.-, 

where  i = j = 1,2,  and  i = 1,2,3.  Recall  that  = (7o»,1u,1m,7s») 

§.k2  ~ 7i2t)  7i3i)  723fc.7it3t).  Ptpi  ~ tnp,  = 12  and  N ~ 48.  The  design  matrix 
X can  be  partitioned  as  ^ = (Xi,X2),  for  all  k,  where 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

-1 

-1 

114 


■ 1 

1 

1 

1 

-1  - 

-1 

-1 

-1 

-1 

-1 

-1 

-1 

1 

1 

1 

1 ■ 

x[  = 

1 

1 

-1 

-1 

1 

1 -1 

-1 

-1 

-1 

1 

1 

-1 

-1 

1 

1 

1 

1 

-1 

-1 

-1  - 

■1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

. 1 

1 

-1 

-1 

-1  - 

-1 

1 

1 

-1 

-1 

1 

1 

1 

1 

-1 

-1  . 

It  is  assumed  here  that  V(Yf^) 

— /l6» 

for  all  k. 

First,  the  simulated  frequentist  risks  was  evaluated  based  on  the  loss  in  (4.3) 
with  Q_^  = for  all  k.  Random  vectors  Fjt’s  are  generated  from  in 

each  replication.  The  sampling  procedure  is  repeated  1,000  times  in  each  simulation. 
The  estimated  frequentist  risks  of  estimating  {k  = 1,2,3),  i.e.,  the  average 
squared  error  losses  of  the  different  estimators  are  compared. 

The  choice  of  the  ^^’s  is  motivated  from  a null  hypothesis  Hq  : = Da^ 

and  departures  from  this  null  hypothesis.  Accordingly,  these  /?^’s  are  of  the  form 

^ = Da^  +rjfc  (the  r*’s  denoting  the  deviations  from  the  null  hypothesis  for  each 
k). 

The  matrix  D is  chosen  as 


and 


= {DlD^) 


0 11  00000 
2 -2  3 -2  0 0 0 0 ’ 


Ol  = 


and  — 


1 

1 ■ 


There  are  several  choices  of  the  rjt’s.  In  the  first  case,  r*  = 0 for  all  k;  in  the 
second  case  that  Hi  = rg  = 0 and  rj  = (1 0 - 110000)^,  in  the  third  case  that 
Ii  = Is  = Q and  f2  = (10  - 1 11234)^,  and  Tj  = 0,r2  = (10  - 1 1 1234)^  and 
2I3  — (0000111 1)^  in  the  last  case.  The  results  of  the  simulated  frequentist  risks 
are  shown  in  Table  4.1. 

In  the  first  case,  and  = 0 for  all  k,  the  weighted  average 

of  LSE's  for  the  reduced  model,  = ((^i£i«i)^,  (Zi£ia2)^,  (Xi£i^)^)^ 

is  the  best  estimator  of  {Xfi)* . However,  in  the  second  case,  which  has  T2  7^  0, 
^RLSE  — ((^1  (^1  ^21^^’ (— 1 eis  an  estimator  of  (X/?)*. 


115 


Table  4.1:  The  estimated  frequentist  risk  results  for  2^  factorial  experiment. 


The 

6 -7  8 -60000 

6 -7  8 -60000 

6 -7  8 -6  0000 

6 -7  8 -60000 

esti.of 

A. 

4-5  5 -4  0000 

5 -54  -3  0000 

5 -5  4 -31234 

5 -5  4 -3  1234 

A 

2 -14  -20000 

2 -14  -2  0000 

2 -14  -2  0000 

2-14  -21111 

^LSE 

23.7611 

23.7611 

23.7611 

23.7611 

^RMLE 

7.94026 

37.4787 

517.479 

581.479 

^RLSE 

11.9607 

11.9607 

491.961 

555.961 

^EB 

15.1057 

20.9210 

23.5225 

23.5418 

^EB 

14.9473 

20.9210 

23.5225 

23.5418 

e* 

^PTE.XQ 

17.6193 

23.8398 

23.7611 

23.7611 

^MEB.W 

13.4669 

20.9913 

23.5225 

23.5418 

^MEB.VQ 

13.4669 

20.9913 

23.5225 

23.5418 

^PTE.OS 

14.5875 

24.0688 

23.7611 

23.7611 

?-MEB. 05 

12.0763 

21.2337 

23.5225 

23.5418 

P+* 

^MEB. 05 

12.0763 

21.2337 

23.5225 

23.5418 

^PTE.Ol 

10.5772 

25.3180 

23.7611 

23.7611 

^MEB. 01 

9.79580 

22.6270 

23.5225 

23.5418 

p+* 

^MEB. 01 

9.79580 

22.6270 

23.5225 

23.5418 

* Note:  For  the  right  two  columns,  during  the  1,000  replications, 
the  null  hypothesis,  and  = 0 {k  = 1,2,3),  was  rejected  in 

every  replication  for  all  a =.10,  .05  and  .01.  Hence, 

^kMEBa  = ^EB  ^.nd  ^\f£Ba  ~ ^kEB- 


116 


Substantial  improvement  in  the  estimated  frequentist  risk  of  over  §1^^  = 
((^^^1)^5  LSE  of  the  model  in  (4.1),  was  observed  in  the 

first  two  cases  for  which  = 0 (A;  = 1,2,3).  Especially  in  the  first  case,  the 
estimated  frequentist  risk  of  e^g  is  found  only  about  one  half  of  magnitude  eis  that 
^LSE-  proved  in  the  earlier  section,  the  estimators  el^gg  or  e^£g  performed 
better  than  Cp^g.  Finally,  if  ^ 0 for  some  k e.g.  the  third  and  the  forth  cases, 
^*EB  ^EB  the  smallest  frequentist  risks.  Admittedly,  the  risk  improvement  is 
not  significant  over  e*gsE^  the  LSE  of  the  model  in  (4.1).  All  other  estimators  are 
superior  over  and  §*pgsg  with  considerably  less  estimated  frequentist  risks. 

Table  4.2  shows  the  estimated  Bayes  risks  of  all  the  estimators.  First,  /?^  [k  = 
1,2,3)  are  generated  separately  from  N{Da^,T‘^[X'^ Xy^)  where 

at  ^ ^ r -1  -1  1 ' 

A — (fli5a2>^)  — 2 2 1’ 

ajid 

11  ooooo' 

— [2-23-20000  ’ 

or 

^r^[0  11  00201' 

~ [2-23-21020  ■ 

For  a generated  set  of  and  given  random  vectors  Z*  (A:  = 1,2,3)  are  then 
generated  separately  from  N[XJ3 ^,0^ Once  again,  the  sampling  procedure  is 
repeated  1,000  times  in  each  simulation. 

When  = .25  and  = 1,  i.e.,  when  the  prior  variability  is  smaller  than  the 
sampling  variability,  the  estimated  Bayes  risks  of  the  different  estimators  maintain 
almost  the  same  order  as  the  corresponding  estimated  frequentist  risks.  This  is 
reflected  very  clearly  when  = 0,  and  to  a certain  extent  when  ^ 0.  Compare, 
for  example,  column  2 of  Table  4.2  with  column  2 of  Table  4.1  or  column  5 of 
Table  4.2  with  Columns  4 and  5 of  Table  4.1.  However,  the  situation  needs  not  be 


117 


Table  4.2:  The  estimated  Bayes  risk  results  for  2®  factorial  experiment. 


Dai 

(6 

-78  -60000) 

(6  -78  -63  -26  -1) 

Da^ 

(4 

-5  5 ^0000) 

(4  -55  -42  -24  -l) 

The 

Da^ 

(2 

-14  -2  0000) 

(2 

-1  4 -2  1221) 

estimator 

1 

1 

of  X/?^’s 

r2 

.25 

1 

4 

.25 

1 

4 

^LSE 

23.8374 

23.8374 

23.8374 

23.8374 

23.8374 

23.8374 

^RMLE 

11.8254 

23.8822 

72.1095 

1371.96 

1384.15 

1432.64 

^RLSE 

14.8925 

23.9566 

60.2131 

1375.02 

1384.22 

1420.74 

^EB 

16.8208 

19.4551 

22.1019 

23.7426 

23.7535 

23.7467 

^EB 

16.7714 

19.4551 

22.1019 

23.7426 

23.7435 

23.7467 

p * 

^PT£.10 

21.3026 

23.8001 

23.8374 

23.8374 

23.8374 

23.8374 

^MEB.XQ 

16.4768 

19.6594 

22.1019 

23.7426 

23.7435 

23.7467 

^MEB.W 

16.4768 

19.6594 

22.1019 

23.7426 

23.7435 

23.7467 

^PTE.Q5 

19.4267 

23.7546 

23.8464 

23.8374 

23.8374 

23.8374 

^MEB.Oh 

15.8970 

19.9896 

22.1129 

23.7426 

23.7435 

23.7467 

P+* 

^MEB.QZ 

15.8970 

19.9896 

22.1129 

23.7426 

23.7435 

23.7467 

e* 

^PTE.Ql 

15.8870 

23.9405 

24.1212 

23.8374 

23.8374 

23.8374 

^MEB.Ol 

14.1719 

21.2385 

22.3997 

23.7426 

23.7435 

23.7467 

p+* 

^MEB.Ol 

14.1719 

21.2385 

22.3997 

23.7426 

23.7435 

23.7467 

* Note:  For  the  right  three  columns,  during  the  1,000  replications, 
the  null  hypothesis,  §^1^2  — 0.  — lj2,3),  was  rejected  in  every 

replication  for  all  a =.10,  .05  and  .01.  Hence, 

^kMEBa  — ^EB  ^.nd  e^/^pBa  ~ ^kEB' 


118 


the  same  when  the  prior  variance  equal  to  or  greater  than  the  sampling  variance 
when  ^ = 0.  For  example  while  cr^  = 1 and  = 4,  performs  poorly  when 

compared  to  the  other  estimators.  The  increased  prior  variability  diminishes  the 
importance  of  the  prior  mean.  However,  when  7^  0,  the  estimators  e*RE{LE  and 
^RLSE  perform  poorly  as  anticiapted.  The  relative  magnitude  of  the  prior  variances 
seems  to  play  a less  important  role  there.  But  the  empirical  Bayes  estimators 
continue  to  maintain  their  superiority  over  the  ^LSE-  As  expected,  this  domination 
is  more  heavily  pronounced  when  D2  = 0,  and  the  prior  variability  is  small  when 
compared  to  the  sampling  variability. 


CHAPTER  5 

CONCLUSIONS 


5.1  Summary  and  Conclusions 

This  study  provides  methods  for  estimating  the  response  function  and  multi- 
variate regression  models  where  there  is  a choice  between  a full  model  and  a reduced 
model.  The  classical  approach  is  to  test  a hypothesis.  If  the  null  hypothesis  is  re- 
jected at  a desired  level  of  significance  the  full  model  is  used,  otherwise,  the  reduced 
model  is  chosen.  However,  the  above  procedure  has  the  drawback  that  there  is  no 
general  way  of  incorporating  the  degree  of  evidence  for  or  against  the  null  hypothesis 
in  the  response  estimator.  Empirical  Bayes  estimators  of  in  the  response  func- 
tion and  in  the  multivariate  models  are  developed  in  this  study. 

The  structure  of  the  corresponding  prior  distribution  oi  3 oi  [B  ,3  I is  moti- 
vated  from  the  associated  null  hypothesis.  The  prior  distribution  in  the  multivariate 
regression  model  II  is  adopted  from  Reinsel  (1985)  and  a new  reduced  model  is  pro- 
posed in  Chapter  4.  All  the  above  priors  are  still  the  Zellner’s  (1986)  g-priors  by 
setting  the  prior  variance-covariance  matrix  E (E*,  /:  = 1,  • • • , m)  as 

A:  = 1,  • • • , m).  The  hierarchical  Bayes  estimators  oi  X0  are  only 
introduced  in  Chapter  2. 

These  empirical  Bayes  and  hierarchical  Bayes  estimators  are  weighted  averages 
of  LSE  s oi  (^i^j,  • • • based  on  the  full  model  and  the  reduced  model, 

or  of  the  LSE  based  on  the  full  model  and  a weighted  average  of  LSE's  for  the 
reduced  model  (see  multivariate  regression  model  II).  These  weights  are  data  de- 
pendent, and  are  such  that  the  proposed  estimators  lean  more  towards  the  LSE  of 
the  full  model  if  the  F ratio  for  the  corresponding  hypothesis  test  is  large,  and  to 


119 


120 


the  LSE  of  the  reduced  model,  or  the  weighted  average  of  LSE's  of  the  reduced 
model,  when  the  opposite  is  true. 

Serveral  conclusions  relating  to  the  EB  estimators  are  summarized  in  the  fol- 
lowings. 

(1)  Under  a general  quradratic  loss,  these  empirical  and  hierarchical  Bayes  es- 

timators are  evaluated  by  their  frequentist  and  Bayes  risks.  It  has  been  shown 
that  under  certain  conditions  empirical  Bayes  estimators  dominate  the  minimax 
estimator  of  (Xi^^,  • • • the  LSE  of  the  full  model,  when  comparisons 

are  based  on  the  frequentist  criterion  (see  Corollary  2.2.1,  Corollary  2.2.2,  Corol- 
lary 3.2.1,  and  Corollary  4.2.1).  The  EB  estimators  then  provides  useful  minimax 
estimators  of  X^ 

(2)  If  the  quadratic  loss  is  specified  by  Q = V_~^  {Q^  = ^),  empirical  Bayes 

estimators  turn  out  to  be  fairly  robust  when  compared  using  the  frequentist  risks 
to  the  LSE  of  the  reduced  model,  and  the  weighted  average  of  the  LSE's  of  the 
reduced  model. 

(3)  In  Chapter  2,  empirical  Bayes  estimators  are  also  shown  dominate  the  es- 
timator of  X^>  which  was  proposed  by  Sclove  (1968),  in  both  the  frequentist  and 
Bayes  senses  (see  Theorem  2.2.2  and  Theorem  2.2.7). 

(4)  By  using  the  argument  in  Theorem  6.2  of  Lehmann  (1983),  it  has  been 
proved  that  the  frequentist  risk  of  the  positive  part  of  empirical  Bayes  estimators 
is  less  than  that  of  empirical  Bayes  estimators  (see  Theorem  2.2.3,  Theorem  3.2.2, 
and  Theorem  4.2.2). 

(5)  The  preliminary  test  estimators  provide  other  compromises  between  the 
LSE  of  the  full  model  and  the  LS E of  the  reduced  model  or  the  weighted  average 
of  the  LSE^s  in  the  reduced  model.  For  every  preliminary  test  estimators  of  Xy^ 

the  corresponding  modified  empirical  Bayes  estimators 


121 


dominate  the  preliminary  test  estimators  (see  Theorem  2.2.4,  Theorem  3.2.3,  and 
Theorem  4.2.3). 

(6)  When  the  evaluation  is  based  on  the  Bayes  criterion,  under  a general  quadratic 
loss,  empircal  Bayes  estimators  are  superior  to  the  LSE  of  the  full  model  (see  The- 
orem 2.2.5,  Theorem  3.2.4,  and  Theorem  4.2.4). 

(7)  If  the  sampling  variability  is  smaller  than  the  prior  variability,  empirical 
Bayes  estimators  dominate  the  LSE  of  the  reduced  model  or  the  weighted  average 
of  the  LSE^s  for  the  reduced  model.  However,  if  the  prior  variability  is  relatively 
smaller  than  the  sample  variability,  then  the  LSE  of  the  reduced  model  or  the 
weighted  average  of  the  LSE's  for  the  reduced  model  can  become  a more  approrate 
estimator  (see  Theorem  2.2.6,  Theorem  3.2.5,  and  Theorem  4.2.5). 

5.2  Future  Studies 

There  are  several  open  problems  in  this  area  that  are  worth  further  study.  First, 
instead  of  using  Zellner’s  (1986)  g-prior,  one  can  use  other  priors.  An  interesting 
question  would  be  to  study  the  robustness  of  the  estimators  developed  under  the 
choice  of  different  priors.  Finally,  hierarchical  Bayes  estimators  should  also  be 
developed  in  the  multivariate  regression  model,  and  their  robustness  properties 
should  be  studied. 


BIBLIOGRAPHY 


Arnold,  S.  F.,  1981.  “The  Theory  of  Linear  Models  and  Multivaxiate  Analysis,” 
John  Wiley  & Sons  Inc.,  New  York. 

Bancroft,  T.  A.,  1944.  “On  Bieises  in  Estimation  due  to  the  Use  of  Preliminary 
Tests  of  Significance,”  Ann.  Math.  Statist.,  15,  190-204. 

Baranchik,  A.  J.,  1970.  “A  Family  of  Minimcix  Estimtors  of  the  Mean  of  a Mul- 
tivariate Normal  Distribution,”  Ann.  Statist.,  41,  642-645. 

Baranchik,  A.  J.,  1973.  “Inadmissibility  of  Maximum  Likelihood  Estimators  in 
some  Multiple  Regression  Problems  with  Three  or  More  Indep.  Variables,” 
Ann.  Statist.,  1,  312-321.7-1136. 

Berger,  J.,  1976.  “Admissible  Minimax  Estimation  of  a Multivariate  Normal  Mean 
with  Arbitrary  Quadratic  Loss,”  Ann.  Statist.,  4,  223-226. 

Berger,  J.,  1982.  “Selecting  a Minimax  Estimator  of  a Multivariate  Normal 
Mean,”  Ann.  Statist.,  10,  81-92. 

Berger,  J.,  and  L.  R.  Haff,  1983.  “A  Class  of  Minimax  Estimators  of  a Nor- 
mal Mean  Vector  for  Arbitrary  Quadratic  Loss  and  Unknown  Covariance 
Matrix,”  Statistics  &:  Decisions,  1,  105-130. 

Bock,  M.  E.,  1975.  “Minimax  Estimations  of  the  Mean  of  a Multivariate  Normal 
Distribution,”  Ann.  Statist.,  3,  209-218. 

Efron,  B.,  and  C.  Morris,  1975.  “Data  Analysis  using  Stein’s  Estimator  and  its 
Generalizations,”  Journal  of  the  American  Statistical  Association,  70,  311- 
319. 

Efron,  B.,  and  C.  Morris,  1976.  “Families  of  Minimax  Estimators  of  the  Mean  of 
a Multivariate  Normal  Distribution,”  Ann.  Statist.,  4,  11-21. 

Ghosh,  M.,  A.  K.  MD.  E.  Saleh,  and  P.  K.  Sen,  1987.  “Empirical  Bayes  Subset 
Estimation  in  Regression  Model,”  Dept,  of  Stat.,  U.  of  Florida,  Gainesville, 
FL.,  Technical  report  no.  281. 

James,  W.,  and  C.  Stein,  1961,  “Estimation  with  Quadratic  Loss,”  Proc.  Fourth 
Berkeley  Symp.  Math.  Statist.  Prob.,  1,  361-379,  U.  of  California  Press, 
Berkeley,  CA. 


Jeffreys,  H.,  1967,  “Theory  of  Probability,”  Oxford  University  Press,  London. 


122 


123 


Judge,  G.  G.,  R.  C.  Hill,  and  M.  E.  Bock,  1988.  “Estimation  of  the  Multivariate 
Normal  Mean  under  Quadratic  Loss,”  preprint. 

Khuri,  A.  I.,  and  J.  A.  Cornell,  1987.  “Response  Surfaces  Designs  and  Analysis,” 
Marcel  Dekker,  Inc.,  New  York. 

Lehmann,  E.  L.,  1983.  “Theory  of  Point  Estimation,”  John  Wiley  & Sons.  Inc.. 

Morris,  C.,  1977.  “Interval  Estimation  for  Empirical  Bayes  Generalizations  of 
Stein’s  estimator,”  Proceedings  of  the  Twenty-Second  Conference  on  the 
Design  of  Experiments  in  Army  Research  Development  and  Testing,  ARC 
Report  77-2. 

Muth,  J.  F.,  1961.  “Rational  Expectations  and  the  Theory  of  Price  Movements,” 
Econometrica,  29,  315-335. 


Myers,  R.  H.,  1976.  “Response  Surface  Methodology,”  Dept,  of  Stat.,  Virginia 
Tech.,  Blacksburg,  VA.,  Distributed  by  Edwards  Brothers,  Inc.,  Ann- 
Arbor,  Michigan. 

Rao,  C.  R.,  1973.  “Linear  Statistical  Inference  and  its  Application,”  2nd  Edition, 
Wiley,  New  York. 

Reinsel,  C.  G.,  1985.  “Mean  Squared  Error  Properties  of  Empirical  Bayes  Esti- 
mators in  a Multivariate  Random  Effects  General  Linear  Model,”  Journal 
of  the  American  Statistical  Association,  80,  642-650. 


Sclove,  S.  L.,  1968.  “Improved  Estimators  for  Coefficients  in  Linear  Regression,” 
Journal  of  the  American  Statistical  Association,  63,  596-606. 

Stein,  C.,  1956.  “Inadmissibility  of  the  Usual  Estimator  for  the  Mean  of  a Multi- 
variate Normal  Distribution,”  Proc.  Third.  Berkeley  Symp.  Math.  Statist. 
Prob.,  1,  197-206,  U.  of  California  Press. 


Stein,  C.,  1960.  “Multiple  Regression,”  Contributions  to  probatility  and  statistics, 
essays  in  honor  of  Harold  Hotelling,  edited  by  Olkin,  L,  Stanford  U.  Press, 
Stanford,  CA. 


Strawderman,  W.  E.,  1971.  “Proper  Bayes  Minimax  Estimators  of  the  Multivari- 
ate Normal  Mean,”  Annals,  of  Mathematical  Statistics,  42,  385-388. 

Zellner,  A.,  1986.  “On  Assessing  Prior  Distribution  and  Bayesian  Regression 
Analysis  with  G-prior  Distribution,”  Bayesian  inference  and  Decision  tech- 
niques, edited  by  Goel,  P.  and  Zellner,  A.,  Elsevier  Science  Publishers 
B.V.. 


BIOGRAPHICAL  SKETCH 


Li-Chu  Lee  was  born  on  March  1,  1957,  in  Taiwan,  the  Repbulic  of  China.  She 
received  her  Bachelor  of  Business  Science  degree  in  communication  and  transporta- 
tion management  science  from  the  National  Cheng-Kung  University,  Taiwan,  the 
Republic  of  China,  in  1979.  She  then  worked  as  a business  officer  in  the  governor- 
mental  telephone  company  for  twenty  months.  She  was  married  to  Mr.  Li-Hwa  Lin 
in  July  1980.  She  came  to  the  United  States  and  joined  her  husband  who  was  a 
graduate  student  at  Oregon  State  University  in  April  1981.  She  entered  the  same 
University  as  a graduate  student  in  1982  and  graduated  with  a master’s  degree  in 
operations  research  in  1983.  In  August,  1984,  she  was  accepted  into  the  graduate 
school  at  the  University  of  Florida  in  the  Department  of  Statistics.  She  has  worked 
as  a teaching  assistant  as  well  as  a research  assistant  while  earning  the  Doctor  of 
Philosophy  degree. 


124 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality,  eis  a 
dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Malay  Ghosh,  Chaurman 
Professor  of  Statistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standeirds  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  queility,  as  a 
dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Michael  A.  DeLorenzo 

Associate  Professor  of  Dairy  Science 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quadity,  as  a 
dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Ramon  C.  Littell 
Professor  of  Statistics 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of  Statis- 
tics in  the  College  of  Liberal  Arts  and  Science  and  to  the  Graduate  School  cind  was  accepted 
as  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of  Philosophy. 

May,  1989 


Dean,  Graduate  School 


